Fungal proteases

ABSTRACT

The present invention provides fungal proteases and improved fungal strains that are deficient in protease production.

The present application is a Divisional of co-pending U.S. patentapplication Ser. No. 14/950,712, filed Nov. 24, 2015, which is aDivisional of U.S. patent application Ser. No. 13/598,051, filed Aug.29, 2012, which claims priority to U.S. Prov. Pat. Appln. Ser. No.61/541,327, filed Sep. 30, 2011, and U.S. Prov. Pat. Appln. Ser. No.61/564,107, filed Nov. 28, 2011, all of which are incorporated byreference in their entireties for all purposes.

REFERENCE TO A “SEQUENCE LISTING,” A TABLE, OR A COMPUTER PROGRAMLISTING APPENDIX SUBMITTED AS AN ASCII TEXT FILE

The Sequence Listing written in file CX35-108US1_ST25.TXT, created onAug. 28, 2012, 461,224 bytes, machine format IBM-PC, MS-Windowsoperating system, is hereby incorporated by reference.

FIELD OF THE INVENTION

The present invention provides fungal proteases and improved fungalstrains that are deficient in protease production.

BACKGROUND

Proteases find use in various settings where the degradation of proteincompositions is desirable. Proteases, also referred to as “proteinases”and “proteolytic enzymes,” catalyze the breakdown of peptide bondswithin proteins. Different types of proteases hydrolyze different typesof peptide bonds. Proteolytic enzymes play important roles in fungaldevelopment and physiology. Secreted proteases are required for survivaland growth of various fungal species, and these enzymes play roles inaccessing a variety of substrates during intracellular protein turnover,processing translocation, sporulation, germination, and differentiation.In addition, fungal proteases are widely used in biotechnology, mainlyin areas such as food processing, leather processing, and in detergentcompositions, as well as in bioremediation compositions and in theproduction of therapeutic peptides.

SUMMARY OF THE INVENTION

The present invention provides fungal proteases and improved fungalstrains that are deficient in protease production.

The present invention provides proteases comprising the polypeptidesequences set forth in SEQ ID NOS:3, 6, 9, and/or 12, and biologicallyactive fragments thereof. In some embodiments, the proteases are fungalproteases. The present invention also provides polynucleotide sequencesencoding the proteases. In some embodiments, the present inventionprovides polynucleotide sequences encoding the fungal proteases providedherein. In some embodiments, the polynucleotide sequence is selectedfrom SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11, and/or a fragmentand/or fusion of SEQ ID NOS: 1, 2, 4, 5, 7, 8, 10, and/or 11. In someadditional embodiments, the present invention provides isolatedpolynucleotide sequences encoding at least one protease, wherein thepolynucleotide hybridizes to the full length complement of SEQ ID NO: 1,2, 4, 5, 7, 8, 10, and/or 11, under stringent hybridization conditions.In some additional embodiments, the present invention provides isolatedpolynucleotides obtainable from a filamentous fungus. In someembodiments, the filamentous fungus is Myceliophthora thermophila.

The present invention also provides vectors comprising at least onepolynucleotide sequence encoding at least one protease, as providedherein. In some embodiments, the polynucleotide sequence is operablylinked to regulatory sequences suitable for expression of thepolynucleotide sequence in a suitable host cell. In some embodiments,the host cell is a prokaryotic cell, while in some other embodiments, itis an eukaryotic cell. In some further embodiments, the host cell is ayeast or filamentous fungal cell. In some embodiments, the host cell isMyceliophthora thermophila. In some embodiments, the host cellscomprising at least one vector as provided herein are prokaryotic oreukaryotic cells. In some embodiments, the host cell is a yeast orfilamentous fungal cell. In some embodiments, the host cell isMyceliophthora thermophila.

The present invention also provides isolated Myceliophthora strainsdeficient in at least one protease native to Myceliophthora, wherein theprotease comprises an amino acid sequence having at least about 70%, atleast about 75%, at least about 80%, at least about 85%, at least about86%, at least about 87%, at least about 88%, at least about 89%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or at leastabout 100% identity with a polypeptide sequence set forth in SEQ ID NO:3, 6, 9, and/or 12. In some embodiments, the Myceliophthora isMyceliophthora thermophila. In some additional embodiments, theMyceliophthora produces at least one enzyme. In some furtherembodiments, the Myceliophthora produces at least one cellulase. Instill some further embodiments, the Myceliophthora produces at least oneenzyme selected from beta-glucosidases, endoglucanases,cellobiohydrolases, cellobiose dehydrogenases, endoxylanases,beta-xylosidases, xylanases, arabinofuranosidases, alpha-glucuronidases,acetylxylan esterases, feruloyl esterases, alpha-glucuronyl esterases,lipases, amylases, glucoamylases, and/or proteases. In some additionalembodiments, the Myceliophthora produces at least one recombinantcellulase and/or non-cellulase, while in some other embodiments, theMyceliophthora produces at least two recombinant cellulases and/ornon-cellulase, and in still some additional embodiments, theMyceliophthora produces at least three recombinant cellulases and/ornon-cellulase. In some embodiments, the cellulase is a recombinantcellulase selected from beta-glucosidases (BGLs), Type 1cellobiohydrolases (CBH1s), Type 2 cellobiohydrolases (CBH2s), glycosidehydrolase 61s (GH61s), and/or endoglucanases (EGs). In some embodiments,the cellulase is a recombinant Myceliophthora cellulase selected frombeta-glucosidases (BGLs), Type 1 cellobiohydrolases (CBH1s), Type 2cellobiohydrolases (CBH2s), glycoside hydrolase 61s (GH61s), and/orendoglucanases (EGs). In some additional embodiments, the cellulase is arecombinant cellulase selected from EG1b, EG2, EG3, EG4, EG5, EG6,CBH1a, CBH1b, CBH2a, CBH2b, GH61a, and/or BGL.

The present invention also provides compositions comprising the isolatedMyceliophthora provided herein. The present invention also providescompositions comprising the isolated Myceliophthora thermophila providedherein. In some embodiments, the present invention provides compositionscomprising at least one of the enzymes produced by at least one isolatedMyceliophthora provided herein. In some embodiments, the presentinvention provides compositions comprising at least one of the enzymesproduced by at least one isolated Myceliophthora thermophila providedherein.

The present invention also provides methods for producing theMyceliophthora described herein, comprising providing a Myceliophthorahaving protease activity, wherein the protease comprises an amino acidsequence having at least about 70%, at least about 75%, at least about80%, at least about 81%, at least about 82%, at least about 83%, atleast about 84%, at least about 85%, at least about 86%, at least about87%, at least about 88%, at least about 89%, at least about 90%, atleast about 91%, at least about 92%, at least about 93%, at least about94%, at least about 95%, at least about 96%, at least about 97%, atleast about 98%, at least about 99%, or at least about 100% identitywith the polypeptide sequence set forth in SEQ ID NO: 3, 6, 9, and/or12; and mutating the Myceliophthora under conditions such that theprotease is mutated to produce a protease-deficient Myceliophthora. Insome embodiments, the present invention provides methods for producingthe Myceliophthora thermophila described herein, comprising providing aMyceliophthora thermophila having protease activity, wherein theprotease comprises an amino acid sequence having at least about 70%, atleast about 75%, at least about 80%, at least about 81%, at least about82%, at least about 83%, at least about 84%, at least about 85%, atleast about 86%, at least about 87%, at least about 88%, at least about89%, at least about 90%, at least about 91%, at least about 92%, atleast about 93%, at least about 94%, at least about 95%, at least about96%, at least about 97%, at least about 98%, at least about 99%, or atleast about 100% identity with the polypeptide sequence set forth in SEQID NO: 3, 6, 9, and/or 12; and mutating the Myceliophthora thermophilaunder conditions such that the protease is mutated to produce aprotease-deficient Myceliophthora thermophila.

The present invention also provides methods for producing at least oneenzyme, comprising providing Myceliophthora, under conditions such thatat least one enzyme is produced by the Myceliophthora. In someembodiments, the at least one enzyme comprises at least one recombinantenzyme. In some further embodiments, the at least one enzyme comprisesat least one recombinant cellulase, at least two recombinant cellulases,at least three recombinant cellulases, at least four recombinantcellulases, and/or at least five recombinant cellulases. In someembodiments, the cellulase is selected from beta-glucosidases (BGLs),Type 1 cellobiohydrolases (CBH1s), Type 2 cellobiohydrolases (CBH2s),glycoside hydrolase 61s (GH61s), and/or endoglucanases (EGs). In someadditional embodiments, the cellulase is a Myceliophthora cellulaseselected from beta-glucosidases (BGLs), Type 1 cellobiohydrolases(CBH1s), Type 2 cellobiohydrolases (CBH2s), glycoside hydrolase 61s(GH61s), and/or endoglucanases (EGs). In some further embodiments, thecellulase is selected from EG1b, EG2, EG3, EG4, EG5, EG6, CBH1a, CBH1b,CBH2a, CBH2b, GH61a, and/or BGL. In still some additional embodiments,the Myceliophthora further produces at least one additional enzyme(e.g., a non-cellulase enzyme). In some embodiments, at least oneadditional enzyme is a recombinant non-cellulase enzyme. In stilladditional embodiments, at least one non-cellulase enzyme is aMyceliophthora non-cellulase enzyme. In some embodiments, at least onenon-cellulase enzyme comprises at least one endoxylanase,beta-xylosidase, xylanase, arabinofuranosidase, alpha-glucuronidase,acetylxylan esterase, feruloyl esterase, alpha-glucuronyl esterase,lipase, amylase, glucoamylase, and/or protease.

The present invention also provides methods for producing at least oneenzyme, comprising providing Myceliophthora thermophila, underconditions such that at least one enzyme is produced by the M.thermophila. In some embodiments, the at least one enzyme comprises atleast one recombinant enzyme. In some further embodiments, the at leastone enzyme comprises at least one recombinant cellulase, at least tworecombinant cellulases, at least three recombinant cellulases, at leastfour recombinant cellulases, and/or at least five recombinantcellulases. In some embodiments, the cellulase is selected frombeta-glucosidases (BGLs), Type 1 cellobiohydrolases (CBH1s), Type 2cellobiohydrolases (CBH2s), glycoside hydrolase 61s (GH61s), and/orendoglucanases (EGs). In some additional embodiments, the cellulase is aM. thermophila cellulase selected from beta-glucosidases (BGLs), Type 1cellobiohydrolases (CBH1s), Type 2 cellobiohydrolases (CBH2s), glycosidehydrolase 61s (GH61s), and/or endoglucanases (EGs). In some furtherembodiments, the cellulase is selected from EG1b, EG2, EG3, EG4, EG5,EG6, CBH1a, CBH1b, CBH2a, CBH2b, GH61a, and/or BGL. In still someadditional embodiments, the M. thermophila further produces at least oneadditional enzyme (e.g., a non-cellulase enzyme). In some embodiments,at least one non-cellulase enzyme is a recombinant non-cellulase enzyme.In still additional embodiments, at least one non-cellulase enzyme is aM. thermophila non-cellulase enzyme. In some embodiments, at least onenon-cellulase enzyme comprises at least one endoxylanase,beta-xylosidase, xylanase, arabinofuranosidase, alpha-glucuronidase,acetylxylan esterase, feruloyl esterase, alpha-glucuronyl esterase,lipase, amylase, glucoamylase, and/or protease.

The present invention also provides compositions comprising at least oneenzyme produced using at least one of the methods provided herein. Insome embodiments the compositions further comprise at least one enzymeproduced by Myceliophthora. In some embodiments, at least one enzyme isa Myceliophthora enzyme produced by a protease-deficient Myceliophthorastrain. In some further embodiments, the at least one enzyme is arecombinant enzyme. In still some additional embodiments, thecompositions comprise at least one enzyme selected frombeta-glucosidases (BGLs), Type 1 cellobiohydrolases (CBH1s), Type 2cellobiohydrolases (CBH2s), glycoside hydrolase 61s (GH61s), and/orendoglucanases (EGs). In some embodiments, the compositions comprise atleast one enzyme, wherein the enzyme is a Myceliophthora cellulaseselected from beta-glucosidases (BGLs), Type 1 cellobiohydrolases(CBH1s), Type 2 cellobiohydrolases (CBH2s), glycoside hydrolase 61s(GH61s), and/or endoglucanases (EGs). In some embodiments, thecompositions comprise at least one cellulase selected from EG1b, EG2,EG3, EG4, EG5, EG6, CBH1a, CBH1b, CBH2a, CBH2b, GH61a, and/or BGL. Insome additional embodiments, the compositions comprise at least onenon-cellulase enzyme. In some embodiments, the cellulase-containingcompositions further comprise at least one non-cellulase enzyme. In someembodiments, the non-cellulase enzyme is a recombinant non-cellulaseenzyme. In some embodiments, the compositions comprise at least onenon-cellulase enzyme selected from at least one lipase, amylase,glucoamylase, and/or protease.

The present invention also provides saccharification methods comprising(a) providing a biomass and Myceliophthora, (b) culturing theMyceliophthora provided herein under conditions in which at least oneenzyme is secreted into a culture broth, and (c) combining the broth andbiomass under conditions such that saccharification occurs, where (b)may take place before or simultaneously with (c). The present inventionalso provides saccharification methods comprising combining at least onecomposition provided herein and biomass under conditions such thatsaccharification occurs. The present invention further providessaccharification methods comprising combining any of enzymes produced asprovided herein with biomass, under conditions such thatsaccharification occurs. In some embodiments, the M. thermophila doesnot produce at least one protease selected from Protease #1, Protease#2, Protease #3, and/or Protease #4, as provided herein. In someembodiments, the Myceliophthora does not produce at least onepolypeptide selected from SEQ ID NOS: 3, 6, 9, and/or 12. In someembodiments, the gene encoding at least one protease selected from thegenes encoding Protease #1, Protease #2, Protease #3, and/or Protease #4has been deleted from the Myceliophthora. In some embodiments, at leastone polynucleotide sequence selected from SEQ ID NOS: 1, 2, 4, 5, 7, 8,10, and/or 11 is deleted from the genome of the Myceliophthora.

The present invention also provides saccharification methods comprising(a) providing a biomass and Myceliophthora thermophila, (b) culturingthe Myceliophthora thermophila provided herein under conditions in whichat least one enzyme is secreted into a culture broth, and (c) combiningthe broth and biomass under conditions such that saccharificationoccurs, where (b) may take place before or simultaneously with (c). Thepresent invention also provides saccharification methods comprisingcombining at least one composition provided herein and biomass underconditions such that saccharification occurs. The present inventionfurther provides saccharification methods comprising combining any ofenzymes produced as provided herein with biomass, under conditions suchthat saccharification occurs. In some embodiments, the Myceliophthorathermophila does not produce at least one protease selected fromProtease #1, Protease #2, Protease #3, and/or Protease #4, as providedherein. In some embodiments, the Myceliophthora thermophila does notproduce at least one polypeptide selected from SEQ ID NOS: 3, 6, 9,and/or 12. In some embodiments, the gene encoding at least one proteaseselected from the genes encoding Protease #1, Protease #2, Protease #3,and/or Protease #4 has been deleted from the Myceliophthora thermophila.In some embodiments, at least one polynucleotide sequence selected fromSEQ ID NOS: 1, 2, 4, 5, 7, 8, 10, and/or 11 is deleted from the genomeof the Myceliophthora thermophila.

The present invention also provides saccharification methods comprising(a) providing a biomass and Myceliophthora, (b) culturing theMyceliophthora provided herein under conditions in which at least oneenzyme is secreted into a culture broth, (c) recovering at least onecellulase and/or non-cellulase enzyme from the broth, (d) combining therecovered cellulase enzyme and/or at least one non-cellulase enzyme andbiomass under conditions such that saccharification occurs. The presentinvention also provides saccharification methods comprising combining atleast one composition provided herein and biomass under conditions suchthat saccharification occurs. The present invention further providessaccharification methods comprising combining any of enzymes produced asprovided herein with biomass, under conditions such thatsaccharification occurs. In some embodiments, the Myceliophthora doesnot produce at least one protease selected from Protease #1, Protease#2, Protease #3, and/or Protease #4, as provided herein. In someembodiments, the Myceliophthora does not produce at least one proteaseselected from Protease #1, Protease #2, Protease #3, and/or Protease #4,as provided herein. In some embodiments, the Myceliophthora does notproduce at least one polypeptide selected from SEQ ID NOS:3, 6, 9,and/or 12. In some embodiments, the gene encoding at least one proteaseselected from the genes encoding Protease #1, Protease #2, Protease #3,and/or Protease #4 has been deleted from the Myceliophthora. In someembodiments, at least one polynucleotide sequence selected from SEQ IDNOS: 1, 2, 4, 5, 7, 8, 10, and/or 11 have been deleted from the genomeof the Myceliophthora.

The present invention also provides saccharification methods comprising(a) providing a biomass and Myceliophthora thermophila, (b) culturingthe Myceliophthora thermophila provided herein under conditions in whichat least one enzyme is secreted into a culture broth, (c) recovering atleast one cellulase and/or non-cellulase enzyme from the broth, (d)combining the recovered cellulase enzyme and/or at least onenon-cellulase enzyme and biomass under conditions such thatsaccharification occurs. The present invention also providessaccharification methods comprising combining at least one compositionprovided herein and biomass under conditions such that saccharificationoccurs. The present invention further provides saccharification methodscomprising combining any of enzymes produced as provided herein withbiomass, under conditions such that saccharification occurs. In someembodiments, the Myceliophthora thermophila does not produce at leastone protease selected from Protease #1, Protease #2, Protease #3, and/orProtease #4, as provided herein. In some embodiments, the Myceliophthorathermophila does not produce at least one polypeptide selected from SEQID NOS: 3, 6, 9, and/or 12. In some embodiments, the gene encoding atleast one protease selected from the genes encoding Protease #1,Protease #2, Protease #3, and/or Protease #4 has been deleted from theMyceliophthora thermophila. In some embodiments, at least one sequenceselected from SEQ ID NOS: 1, 2, 4, 5, 7, 8, 10, and/or 11 has beendeleted from the genome of the Myceliophthora thermophila.

The present invention also provides isolated fungal proteases comprisingamino acid sequences that are at least about 75%, at least about 80%, atleast about 85%, at least about 86%, at least about 87%, at least about88%, at least about 89%, at least about 90%, at least about 91%, atleast about 92%, at least about 93%, at least about 94%, at least about95%, at least about 96%, at least about 97%, at least about 98%, or atleast about 99% identical to any of SEQ ID NOS:3, 6, 9, and/or 12 or abiologically active fragment of any of SEQ ID NOS:3, 6, 9, and/or 12,wherein the amino acid sequence of the protease is numbered withreference to SEQ ID NO:3. In some embodiments, the fungal proteasescomprise the polypeptide sequence(s) set forth in SEQ ID NOS:3, 6, 9,and/or 12, or a biologically active fragment thereof.

The present invention also provides isolated polynucleotide sequencesencoding the fungal proteases provided herein. In some embodiments, theisolated polynucleotide sequences comprise at least one sequenceselected from SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11, and/or afragment and/or fusion of SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11. Insome additional embodiments, the polynucleotides hybridize to the fulllength complement of SEQ ID NO:1, 2, 4, 5, 7, 8, 10, and/or 11, understringent hybridization conditions. In some additional embodiments, theisolated polynucleotides are obtainable from a filamentous fungus. Insome further embodiments, the filamentous fungus is Myceliophthora. Instill some additional embodiments, the filamentous fungus isMyceliophthora thermophila.

The present invention also provides vectors comprising at least onepolynucleotide sequence encoding at least one protease provided herein.In some embodiments, the isolated polynucleotide sequences comprise atleast one sequence selected from SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or11, and/or a fragment and/or fusion of SEQ ID NOS:1, 2, 4, 5, 7, 8, 10,and/or 11. In some additional embodiments, the polynucleotides hybridizeto the full length complement of SEQ ID NO:1, 2, 4, 5, 7, 8, 10, and/or11, under stringent hybridization conditions. In some additionalembodiments, the isolated polynucleotides are obtainable from afilamentous fungus. In some further embodiments, the filamentous fungusis Myceliophthora. In still some additional embodiments, the filamentousfungus is Myceliophthora thermophila. In some embodiments, thepolynucleotide sequence(s) comprising the vector is operably linked toregulatory sequences suitable for expression of the polynucleotidesequence in a suitable host cell. In some embodiments, the host cell isa prokaryotic or eukaryotic cell. In some further embodiments, the hostcell is a eukaryotic cell. In some additional embodiments, the host cellis a yeast or filamentous fungal cell. In some embodiments, the hostcell is Myceliophthora. In some further embodiments, the host cell isMyceliophthora thermophila.

The present invention further provides host cells comprising at leastone vector as provided herein. In some embodiments the host cell isprokaryotic or eukaryotic cell. In some embodiments, the host cell is aprokaryotic or eukaryotic cell. In some further embodiments, the hostcell is a eukaryotic cell. In some additional embodiments, the host cellis a yeast or filamentous fungal cell. In some embodiments, the hostcell is Myceliophthora. In some further embodiments, the host cell isMyceliophthora thermophila. In some embodiments, the isolatedpolynucleotide sequences of the vectors comprise at least one sequenceselected from SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11, and/or afragment and/or fusion of SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11. Insome additional embodiments, the polynucleotides hybridize to the fulllength complement of SEQ ID NO:1, 2, 4, 5, 7, 8, 10, and/or 11, understringent hybridization conditions. In some additional embodiments, theisolated polynucleotides are obtainable from a filamentous fungus. Insome further embodiments, the filamentous fungus is Myceliophthora. Instill some additional embodiments, the filamentous fungus isMyceliophthora thermophila.

The present invention also provides isolated Myceliophthora deficient inat least one protease native to Myceliophthora, wherein the proteasecomprises an amino acid sequence having at least about 70%, at leastabout 75%, at least about 80%, at least about 85%, at least about 90%,at least about 91%, at least about 92%, at least about 93%, at leastabout 94%, at least about 95%, at least about 96%, at least about 97%,at least about 98%, at least about 99%, or about 100% identity with thepolypeptide sequence set forth in SEQ ID NO:3, 6, 9, and/or 12. In someembodiments, the protease comprises an amino acid sequence having atleast 70%, at least 75%, at least 80%, at least 85%, at least 90%, atleast 91%, at least 92%, at least 93%, at least 94%, at least 95%, atleast 96%, at least 97%, at least 98%, at least 99%, or 100% identitywith the polypeptide sequence set forth in SEQ ID NO:3, 6, 9, and/or 12.In some embodiments the Myceliophthora is Myceliophthora thermophila. Insome additional embodiments, the Myceliophthora produces at least oneenzyme. In some embodiments, the Myceliophthora provided herein producesat least one cellulase. In some further embodiments, the Myceliophthoraproduces at least one cellulase is selected from beta-glucosidases,endoglucanases, cellobiohydrolases, cellobiose dehydrogenases,xylanases, beta-xylosidases, arabinofuranosidases, alpha-glucuronidases,acetylxylan esterases, feruloyl esterases, alpha-glucuronyl esterases,laccases, and/or peroxidases. In some embodiments, the Myceliophthoraproduces at least one recombinant cellulase, while in some alternativeembodiments the Myceliophthora produces at least two recombinantcellulases, and in some further embodiments, the Myceliophthora producesat least three, four, five, or more recombinant cellulases. In someembodiments, the recombinant cellulase comprises a recombinant cellulaseselected from beta-glucosidases (BGLs), Type 1 cellobiohydrolases(CBH1s), Type 2 cellobiohydrolases (CBH2s), glycoside hydrolase 61s(GH61s), and/or endoglucanases (EGs). In some additional embodiments,the cellulase comprises a recombinant Myceliophthora cellulase selectedfrom beta-glucosidases (BGLs), Type 1 cellobiohydrolases (CBH1s), Type 2cellobiohydrolases (CBH2s), glycoside hydrolase 61s (GH61s), and/orendoglucanases (EGs). In some further embodiments, the cellulase is arecombinant cellulase selected from EG1b, EG2, EG3, EG4, EG5, EG6,CBH1a, CBH1b, CBH2a, CBH2b, GH61a, and/or BGL. In some additionalembodiments, the Myceliophthora further produces at least onenon-cellulase enzyme. In some embodiments, the Myceliophthora producesat least one non-cellulase enzyme comprising at least one lipase,amylase, glucoamylase, protease, oxidase, and/or reductase. In someadditional embodiments, the Myceliophthora produces two, three, four, ormore non-cellulase enzymes.

The present invention also provides compositions comprising theMyceliophthora provided herein. The present invention also providescompositions comprising at least one enzyme produced by theMyceliophthora provided herein. In some embodiments, the Myceliophthorais Myceliophthora thermophila. The present invention also providescompositions comprising Myceliophthora thermophila. In some embodimentsthe compositions comprise at least one additional enzyme produced by atleast one Myceliophthora provided herein. In some further embodiments,the compositions further comprise at least one additional enzymeproduced by any suitable organism, including but not limited to anysuitable eukaryotic and/or prokaryotic organisms. In some furtherembodiments, the compositions further comprise at least one additionalsuitable organism, including but not limited to eukaryotic andprokaryotic organisms. In some embodiments, the additional organism isselected from yeast, filamentous fungi, and bacteria.

The present invention further provides methods for producing theMyceliophthora provided herein, comprising providing a Myceliophthorahaving protease activity, wherein the protease comprises at least oneamino acid sequence having at least 70%, at least 75%, at least 80%, atleast 85%, at least 90%, at least 91%, at least 92%, at least 93%, atleast 94%, at least 95%, at least 96%, at least 97%, at least 98%, atleast 99%, or 100% identity with at least one polypeptide sequence setforth in SEQ ID NO:3, 6, 9, and/or 12; and mutating the Myceliophthoraunder conditions such that a protease-deficient Myceliophthora isproduced. The present invention further provides methods for producingthe Myceliophthora provided herein, comprising providing aMyceliophthora having protease activity, wherein the protease comprisesat least one amino acid sequence having at least about 70%, at leastabout 75%, at least about 80%, at least about 85%, at least about 90%,at least about 91%, at least about 92%, at least about 93%, at leastabout 94%, at least about 95%, at least about 96%, at least about 97%,at least about 98%, or at least about 99% identity with at least onepolypeptide sequence set forth in SEQ ID NO:3, 6, 9, and/or 12; andmutating the Myceliophthora under conditions such that aprotease-deficient Myceliophthora is produced. It is not intended thatthe protease-deficient Myceliophthora be produced using any particularmethods, as it is intended that any suitable method for production ofprotease-deficient fungal organisms will find use in the presentinvention. In some embodiments, the Myceliophthora is Myceliophthorathermophila.

The present invention also provides methods for producing at least oneenzyme, comprising providing the Myceliophthora provided herein, underconditions such that at least one enzyme is produced by theMyceliophthora. In some embodiments, at least one enzyme produced by theisolated Myceliophthora comprises at least one recombinant enzyme. Insome embodiments, at least one enzyme comprises at least one recombinantcellulase, while in some alternative embodiments the methods provide atleast two recombinant cellulases, and some further embodiments, themethods provide at least three, four, or five or more recombinantcellulases. In some embodiments, the cellulase is selected frombeta-glucosidases (BGLs), Type 1 cellobiohydrolases (CBH1s), Type 2cellobiohydrolases (CBH2s), glycoside hydrolase 61s (GH61s), and/orendoglucanases (EGs). IN some further embodiments, the cellulase is aMyceliophthora cellulase selected from beta-glucosidases (BGLs), Type 1cellobiohydrolases (CBH1s), Type 2 cellobiohydrolases (CBH2s), glycosidehydrolase 61s (GH61s), and/or endoglucanases (EGs). In some additionalembodiments, the cellulase is selected from EG1b, EG2, EG3, EG4, EG5,EG6, CBH1a, CBH1b, CBH2a, CBH2b, GH61a, and/or BGL. In some embodiments,the Myceliophthora further produces at least one non-cellulase enzyme.In some additional embodiments, the non-cellulase enzyme(s) is/arerecombinant non-cellulase enzyme(s). In some further embodiments, thenon-cellulase enzyme(s) comprise at least one lipase, amylase,glucoamylase, protease, oxidase, and/or reductase. In some additionalembodiments, the Myceliophthora produces two, three, four, or morenon-cellulase enzymes. In some embodiments, the Myceliophthora isMyceliophthora thermophila.

The present invention also provides compositions comprising at least oneenzyme produced using at least one method provided herein. In someembodiments, the composition further comprises Myceliophthora. In someadditional embodiments, the compositions comprise at least oneMyceliophthora enzyme. In some further embodiments, at least one enzymeis a recombinant enzyme. In some additional embodiments, at least oneenzyme is selected from beta-glucosidases (BGLs), Type 1cellobiohydrolases (CBH1s), Type 2 cellobiohydrolases (CBH2s), glycosidehydrolase 61s (GH61s), and/or endoglucanases (EGs). In some embodiments,the compositions comprise at least one enzyme comprising at least oneMyceliophthora cellulase selected from beta-glucosidases (BGLs), Type 1cellobiohydrolases (CBH1s), Type 2 cellobiohydrolases (CBH2s), glycosidehydrolase 61s (GH61s), and/or endoglucanases (EGs). In some embodiments,the cellulase is selected from EG1b, EG2, EG3, EG4, EG5, EG6, CBH1a,CBH1b, CBH2a, CBH2b, GH61a, and/or BGL. In some additional embodiments,the Myceliophthora is Myceliophthora thermophila. In some furtherembodiments, the compositions further comprise at least onenon-cellulase enzyme. In some embodiments, at least one non-cellulaseenzyme is a recombinant non-cellulase enzyme. In some furtherembodiments, the non-cellulase enzyme(s) comprise at least one lipase,amylase, glucoamylase, protease, oxidase, and/or reductase. In someadditional embodiments, the Myceliophthora produces two, three, four, ormore non-cellulase enzymes. In some embodiments, the Myceliophthora isMyceliophthora thermophila.

The present invention also provides saccharification methods comprising(a) providing biomass and protease-deficient Myceliophthora as providedherein in a culture broth, (b) culturing the protease-deficientMyceliophthora under conditions in which at least one enzyme is secretedby the Myceliophthora into the culture broth to provide anenzyme-containing broth, and (c) combining the enzyme-containing brothand the biomass under conditions such that saccharification occurs,where (b) may take place before or simultaneously with (c). In someembodiments, the saccharification methods comprise combining at leastone composition as provided herein and biomass under conditions suchthat saccharification occurs. In some further embodiments, fermentablesugars are produced during saccharification.

The present invention also provides methods for producing a fermentablesugar from at least one cellulosic substrate, comprising contacting thecellulosic substrate with at least one enzyme selected frombeta-glucosidase (Bgl), at least one endoglucanase (EG), at least onetype 2b cellobiohydrolase (CBH2b), at least one glycoside hydrolase 61(GH61), and/or at least one CBH1a produced by at least oneprotease-deficient Myceliophthora provided herein, under conditions inwhich the fermentable sugar is produced.

The present invention also provides methods of producing at least oneend-product from at least one cellulosic substrate, the methodcomprising: (a) contacting the cellulosic substrate with at least oneenzyme selected from beta-glucosidase (Bgl), at least one endoglucanase(EG), at least one type 2b cellobiohydrolase (CBH2b), at least oneglycoside hydrolase 61 (GH61), and/or at least one CBH1a produced by theprotease-deficient Myceliophthora provided herein, under conditions inwhich fermentable sugars are produced; and (b) contacting thefermentable sugars with a microorganism in a fermentation to produce theend-product. In some embodiments, the cellulosic substrate is pretreatedprior to step (a). In some embodiments, at least one end productcomprises at least one fermentation end product. In some embodiments,the methods further comprise recovering at least one end product. Insome additional embodiments, the fermentation end product is selectedfrom alcohols, organic acids, diols, fatty acids, lactic acid, aceticacid, 3-hydroxypropionic acid, acrylic acid, succinic acid, citric acid,malic acid, fumaric acid, amino acids, 1,3-propanediol, ethylene,glycerol, fatty alcohols, butadiene, and beta-lactams. In someembodiments, the fermentation end product is at least one alcoholselected from ethanol and butanol. In some further embodiments, thealcohol is ethanol. In some additional embodiments, the microorganism isa yeast. In some embodiments, the yeast is Saccharomyces.

The present invention also provides use of at least oneprotease-deficient Myceliophthora provided herein and/or at least onecomposition as provided herein, to produce at least one fermentation endproduct. In some embodiments, the present invention also provides use ofat least one protease-deficient Myceliophthora provided herein and/or atleast one composition provided herein to produce at least onefermentation end product selected from alcohols, fatty acids, lacticacid, acetic acid, 3-hydroxypropionic acid, acrylic acid, citric acid,malic acid, fumaric acid, succinic acid, amino acids, 1,3-propanediol,ethylene, glycerol, butadiene, fatty alcohols, and beta-lactams. In someembodiments, the fermentation end product is an alcohol selected fromethanol and butanol. In some further embodiments, the alcohol isethanol.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 provides a map of the construct C1V16-1809.g1.

FIG. 2 provides a map of the construct pUC19-690.g5.

DESCRIPTION OF THE INVENTION

The present invention provides fungal proteases and improved fungalstrains that are deficient in protease production.

In some embodiments, the improved fungal strains find use in hydrolyzingcellulosic material to glucose. In some embodiments, the improved fungalstrains find use in hydrolyzing lignocellulose material. As indicatedherein, the present invention provides improved fungal strains for theconversion of cellulose to fermentable sugars (e.g., glucose). Inparticular, the improved fungal strains provided herein are geneticallymodified to reduce the amount of endogenous protease activity secretedby the cells. The present invention also provides purified enzymesproduced by the improved fungal strains provided herein.

Fungi are particularly suitable for large scale production of usefulproteins, particularly proteins that are secreted from cells.Proteolytic enzymes play roles in these production processes, as theyare generally required for proper processing of proteins and themetabolic health of the host organism. However, proteolytic degradationcan sometimes result in decreased yields of secreted proteins. Inaddition, separation of intact from cleaved proteins, particularly on alarge scale, is challenging and time-consuming Thus, in some situationsit is desirable to attenuate protease production and/or activity. Meansto achieve this attenuation include, but are not limited to deleting(i.e., knocking out) the genes encoding proteases that are problematicin protein production.

The present invention provides novel proteases obtained fromMyceliophthora thermophila, as well as M. thermophila strains that aredeficient in the production of at least one protease.

DEFINITIONS

Unless otherwise indicated, the practice of the present inventioninvolves conventional techniques commonly used in molecular biology,protein engineering, microbiology, and fermentation science, which arewithin the skill of the art. Such techniques are well-known anddescribed in numerous texts and reference works well known to those ofskill in the art. All patents, patent applications, articles andpublications mentioned herein, both supra and infra, are herebyexpressly incorporated herein by reference.

Unless defined otherwise herein, all technical and scientific terms usedherein have the same meaning as commonly understood by one of ordinaryskill in the art to which this invention pertains. Many technicaldictionaries are known to those of skill in the art. Although anysuitable methods and materials similar or equivalent to those describedherein find use in the practice of the present invention, some methodsand materials are described herein. It is to be understood that thisinvention is not limited to the particular methodology, protocols, andreagents described, as these may vary, depending upon the context theyare used by those of skill in the art. Accordingly, the terms definedimmediately below are more fully described by reference to theapplication as a whole.

Also, as used herein, the singular “a”, “an,” and “the” include theplural references, unless the context clearly indicates otherwise.Numeric ranges are inclusive of the numbers defining the range. Thus,every numerical range disclosed herein is intended to encompass everynarrower numerical range that falls within such broader numerical range,as if such narrower numerical ranges were all expressly written herein.It is also intended that every maximum (or minimum) numerical limitationdisclosed herein includes every lower (or higher) numerical limitation,as if such lower (or higher) numerical limitations were expresslywritten herein. Furthermore, the headings provided herein are notlimitations of the various aspects or embodiments of the invention whichcan be had by reference to the application as a whole. Accordingly, theterms defined immediately below are more fully defined by reference tothe application as a whole. Nonetheless, in order to facilitateunderstanding of the invention, a number of terms are defined below.Unless otherwise indicated, nucleic acids are written left to right in5′ to 3′ orientation; amino acid sequences are written left to right inamino to carboxy orientation, respectively.

As used herein, the term “comprising” and its cognates are used in theirinclusive sense (i.e., equivalent to the term “including” and itscorresponding cognates)

As used herein, “protease” includes enzymes that hydrolyze peptide bonds(peptidases), as well as enzymes that hydrolyze bonds between peptidesand other moieties, such as sugars (glycopeptidases). Many proteases arecharacterized under EC 3.4, and are suitable for use in the presentinvention. Some specific types of proteases include but are not limitedto, cysteine proteases including pepsin, papain and serine proteasesincluding chymotrypsins, carboxypeptidases and metalloendopeptidases.

As used herein, the term “protease-deficient” refers to microbialstrains, in particular fungal strains (e.g., M. thermophila) thatproduce reduced levels or no endogenous or heterologous proteases. Insome embodiments, the strains do not produce at least one proteaseselected from Protease #1, Protease #2, Protease #3, and/or Protease #4,as provided herein. In some embodiments, the M. thermophila does notproduce at least one polypeptide selected from SEQ ID NOS:3, 6, 9,and/or 12. In some embodiments, the gene encoding at least one proteaseselected from the genes encoding Protease #1, Protease #2, Protease #3,and/or Protease #4 has been deleted from the M. thermophila. In someembodiments, at least one polynucleotide sequence selected from SEQ IDNOS:1, 2, 4, 5, 7, 8, 10, and/or 11 have been deleted from the genome ofthe M. thermophila. In some additional embodiments, at least onepolynucleotide sequence selected from SEQ ID NOS: 1, 2, 4, 5, 7, 8, 10,and/or 11 have been mutated, such that the M. thermophila produces areduced level of at least one protease (e.g., Protease #1, Protease #2,Protease #3, and/or Protease #4), as compared to a M. thermophila inwhich SEQ ID NOS: 1, 2, 4, 5, 7, 8, 10, and/or 11 have not been mutated.In some embodiments, at least one polynucleotide sequence or a portionthereof selected from SEQ ID NOS: 1, 2, 4, 5, 7, 8, 10, and/or 11 areexpressed by M. thermophila, but reduced levels or no detectable levelsof at least one protease (e.g., Protease #1, Protease #2, Protease #3,and/or Protease #4) are produced. It is also intended that the term beused to indicate that a strain is deficient in the production of aspecific protease but not other protease(s). Thus, in some embodiments,the strain is deficient in the production of at least one proteaseselected from Protease #1, Protease #2, Protease #3, and/or Protease #4,but is not deficient in production of at least one additional protease,including but not limited to endogenous and/or heterologous protease(s).

As used herein, “substrate” refers to a substance or compound that isconverted or designated for conversion into another compound (e.g., aproduct) by the action of an enzyme. The term includes not only a singlecompound but also combinations of compounds, such as solutions, mixturesand other materials which contain at least one substrate.

As used herein, “conversion” refers to the enzymatic transformation of asubstrate to the corresponding product. “Percent conversion” refers tothe percent of the substrate that is converted to the product within aperiod of time under specified conditions.

The terms “polynucleotide” and “nucleic acid”, used interchangeablyherein, refer to a polymeric form of nucleotides of any length, eitherribonucleotides or deoxyribonucleotides. These terms include, but arenot limited to, single-, double- or triple-stranded DNA, genomic DNA,cDNA, RNA, DNA-RNA hybrid, polymers comprising purine and pyrimidinebases, and/or other natural, chemically, biochemically modified,non-natural or derivatized nucleotide bases. The following arenon-limiting examples of polynucleotides: genes, gene fragments,chromosomal fragments, ESTs, exons, introns, mRNA, tRNA, rRNA,ribozymes, cDNA, recombinant polynucleotides, branched polynucleotides,plasmids, vectors, isolated DNA of any sequence, isolated RNA of anysequence, nucleic acid probes, and primers. In some embodiments,polynucleotides comprise modified nucleotides, such as methylatednucleotides and nucleotide analogs, uracyl, other sugars and linkinggroups such as fluororibose and thioate, and/or nucleotide branches. Insome alternative embodiments, the sequence of nucleotides is interruptedby non-nucleotide components.

As used herein, the terms “DNA construct” and “transforming DNA” areused interchangeably to refer to DNA that is used to introduce sequencesinto a host cell or organism. The DNA may be generated in vitro by PCRor any other suitable technique(s) known to those in the art. In someembodiments, the DNA construct comprises a sequence of interest (e.g.,as an “incoming sequence”). In some embodiments, the sequence isoperably linked to additional elements such as control elements (e.g.,promoters, etc.). In some embodiments, the DNA construct furthercomprises at least one selectable marker. In some further embodiments,the DNA construct comprises an incoming sequence flanked by homologyboxes. In some further embodiments, the transforming DNA comprises othernon-homologous sequences, added to the ends (e.g., stuffer sequences orflanks). In some embodiments, the ends of the incoming sequence areclosed such that the transforming DNA forms a closed circle. Thetransforming sequences may be wild-type, mutant or modified. In someembodiments, the DNA construct comprises sequences homologous to thehost cell chromosome. In some other embodiments, the DNA constructcomprises non-homologous sequences. Once the DNA construct is assembledin vitro, it may be used to: 1) insert heterologous sequences into adesired target sequence of a host cell; 2) mutagenize a region of thehost cell chromosome (i.e., replace an endogenous sequence with aheterologous sequence); 3) delete target genes; and/or 4) introduce areplicating plasmid into the host. In some embodiments, the incomingsequence comprises at least one selectable marker. This sequence cancode for one or more proteins of interest. It can have other biologicalfunctions. In many cases the incoming sequence comprises at least oneselectable marker, such as a gene that confers antimicrobial resistance.

As used herein, the terms “expression cassette” and “expression vector”refer to nucleic acid constructs generated recombinantly orsynthetically, with a series of specified nucleic acid elements thatpermit transcription of a particular nucleic acid in a target cell. Therecombinant expression cassette can be incorporated into a plasmid,chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acidfragment. Typically, the recombinant expression cassette/vectorincludes, among other sequences, a nucleic acid sequence to betranscribed and a promoter. In some embodiments, expression vectors havethe ability to incorporate and express heterologous DNA fragments in ahost cell. Many prokaryotic and eukaryotic expression vectors arecommercially available. Selection of appropriate expression vectors iswithin the knowledge of those of skill in the art. The term “expressioncassette” is used interchangeably herein with “DNA construct,” and theirgrammatical equivalents. Selection of appropriate expression vectors iswithin the knowledge of those of skill in the art.

As used herein, the term “vector” refers to a polynucleotide constructdesigned to introduce nucleic acids into one or more cell types. Vectorsinclude cloning vectors, expression vectors, shuttle vectors, plasmids,cassettes and the like. In some embodiments, the polynucleotideconstruct comprises a DNA sequence encoding the enzyme (e.g., precursoror mature enzyme) that is operably linked to a suitable prosequencecapable of effecting the expression of the DNA in a suitable host.

As used herein, “a secretion signal peptide” can be a propeptide, aprepeptide or both. For example, the term “propeptide” refers to aprotein precursor that is cleaved to yield a mature protein. The term“prepeptide” refers to a polypeptide synthesized with an N-terminalsignal peptide that targets it for secretion. Accordingly, a“pre-pro-peptide” is a polypeptide that contains a signal peptide thattargets the polypeptide for secretion and which is cleaved off to yielda mature polypeptide. Signal peptides are found at the N-terminus of theprotein and are typically composed of between about 3 to about 136 basicand hydrophobic amino acids.

As used herein, the term “plasmid” refers to a circular double-stranded(ds) DNA construct used as a cloning vector, and which forms anextrachromosomal self-replicating genetic element in some eukaryotes orprokaryotes, or integrates into the host chromosome.

As used herein in the context of introducing a nucleic acid sequenceinto a cell, the term “introduced” refers to any method suitable fortransferring the nucleic acid sequence into the cell. Such methods forintroduction include but are not limited to protoplast fusion,transfection, transformation, conjugation, transduction, andelectroporation.

As used herein, the terms “transformed” and “stably transformed” refersto a cell that has a non-native (i.e., heterologous) polynucleotidesequence integrated into its genome or as an episomal plasmid that ismaintained for at least two generations.

As used herein, the terms “control sequences” and “regulatory sequences”refer to nucleic acid sequences necessary and/or useful for expressionof a polynucleotide encoding a polypeptide. In some embodiments, controlsequences are native (i.e., from the same gene) or foreign (i.e., from adifferent gene) to the polynucleotide encoding the polypeptide. Controlsequences include, but are not limited to leaders, polyadenylationsequences, propeptide sequences, promoters, signal peptide sequences,and transcription terminators. In some embodiments, at a minimum,control sequences include a promoter, and transcriptional andtranslational stop signals. In some embodiments, control sequences areprovided with linkers for the purpose of introducing specificrestriction sites facilitating ligation of the control sequences withthe coding region of the polynucleotide encoding the polypeptide.

As used herein, “operably linked” refers to a configuration in which acontrol sequence is appropriately placed (i.e., in a functionalrelationship) at a position relative to a polynucleotide of interestsuch that the control sequence directs or regulates the expression ofthe polynucleotide and/or polypeptide of interest. Thus, a nucleic acidis “operably linked” to another nucleic acid sequence when it is placedinto a functional relationship with another nucleic acid sequence. Forexample, DNA encoding a secretory leader (i.e., a signal peptide), isoperably linked to DNA for a polypeptide if it is expressed as apreprotein that participates in the secretion of the polypeptide; apromoter or enhancer is operably linked to a coding sequence if itaffects the transcription of the sequence; or a ribosome binding site isoperably linked to a coding sequence if it is positioned so as tofacilitate translation. Generally, “operably linked” means that the DNAsequences being linked are contiguous, and, in the case of a secretoryleader, contiguous and in reading phase. However, enhancers do not haveto be contiguous Linking is accomplished by ligation at convenientrestriction sites. If such sites do not exist, the syntheticoligonucleotide adaptors or linkers are used in accordance withconventional practice.

As used herein the term “gene” refers to a polynucleotide (e.g., a DNAsegment), that encodes a polypeptide and includes regions preceding andfollowing the coding regions as well as intervening sequences (introns)between individual coding segments (exons).

Nucleic acids “hybridize” when they associate, typically in solution.There are numerous texts and other reference materials that providedetails regarding hybridization methods for nucleic acids (See e.g.,Tijssen, Laboratory Techniques in Biochemistry and MolecularBiology-Hybridization with Nucleic Acid Probes,” Part 1, Chapter 2,Elsevier, New York, [1993], incorporated herein by reference). Forpolynucleotides of at least 100 nucleotides in length, low to very highstringency conditions are defined as follows: prehybridization andhybridization at 42° C. in 5×SSPE, 0.3% SDS, 200 μg/ml sheared anddenatured salmon sperm DNA, and either 25% formamide for lowstringencies, 35% formamide for medium and medium-high stringencies, or50% formamide for high and very high stringencies, following standardSouthern blotting procedures. For polynucleotides of at least 200nucleotides in length, the carrier material is finally washed threetimes each for 15 minutes using 2×SSC, 0.2% SDS at least at 50° C.(“low” stringency), at least at 55° C. (“medium” or “moderate”stringency), at least at 60° C. (“medium-high” stringency), at least at65° C. (“high” stringency), and at least at 70° C. (“very high”stringency). In some embodiments, the stringency conditions includethose that: (1) employ low ionic strength and high temperature forwashing, for example 0.015 M sodium chloride/0.0015 M sodiumcitrate/0.1% sodium dodecyl sulfate at 50° C.; (2) employ a denaturingagent during hybridization, such as formamide, for example, 50% (v/v)formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1%polyvinylpyrrolidone/50 mM sodium phosphate buffer at pH 6.5 with 750 mMsodium chloride, 75 mM sodium citrate at 42° C.; or (3) employ 50%formamide, 5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodiumphosphate (pH 6.8), 0.1% sodium pyrophosphate, 5×Denhardt's solution,sonicated salmon sperm DNA (50 μg/mL), 0.1% SDS, and 10% dextran sulfateat 42° C., with washes at 42° C. in 0.2×SSC (sodium chloride/sodiumcitrate) and 50% formamide at 55° C., followed by a high-stringency washconsisting of 0.1×SSC containing EDTA at 55° C. In other embodiments,the stringency conditions include overnight incubation at 37° C. in asolution comprising: 20% formamide, 5×SSC (150 mM NaCl, 15 mM trisodiumcitrate), 50 mM sodium phosphate (pH 7.6), 5×Denhardt's solution, 10%dextran sulfate, and 20 mg/mL denatured sheared salmon sperm DNA,followed by washing the filters in 1×SSC at about 37-50° C. The skilledartisan will recognize how to adjust the temperature, ionic strength,etc. as necessary to accommodate factors to accomplish the desiredstringency.

As used herein, an “endogenous” or “homologous” gene refers to a genethat is found in a parental strain of a cell (e.g., a fungal orbacterial cell). In some embodiments, endogenous genes are present inwild-type strains. As used herein in making comparisons between nucleicacid sequences, “homologous genes” (or “homologue” genes) refers togenes from different, but usually related species, that correspond toeach other and are identical or very similar to each other. The termencompasses genes that are separated by speciation (i.e., thedevelopment of new species) (e.g., orthologous genes), as well as genesthat have been separated by genetic duplication (e.g., paralogousgenes).

As used herein, “heterologous” polynucleotides are any polynucleotidesthat are introduced into a host cell through the use of laboratorytechniques/manipulation, and include polynucleotides that are removedfrom a host cell, subjected to laboratory manipulation, and thenreintroduced into a host cell.

As used herein, when used with reference to a nucleic acid orpolypeptide, the term “heterologous” refers to a sequence that is notnormally expressed and secreted by an organism (e.g., a “wild-type”organism). In some embodiments, the term encompasses a sequence thatcomprises two or more subsequences which are not found in the samerelationship to each other as normally found in nature, or isrecombinantly engineered so that its level of expression, or physicalrelationship to other nucleic acids or other molecules in a cell, orstructure, is not normally found in nature. For instance, a heterologousnucleic acid is typically recombinantly produced, having two or moresequences from unrelated genes arranged in a manner not found in nature(e.g., a nucleic acid open reading frame (ORF) of the inventionoperatively linked to a promoter sequence inserted into an expressioncassette, such as a vector).

As used herein, a “heterologous enzyme” is used in reference to anenzyme that is encoded by a heterologous gene. However, it is alsocontemplated herein that a heterologous gene can encode an endogenous orhomologous enzyme. As used herein, the term “heterologous gene” refersto a gene that occurs in a form not found in a parental strain of thefungal cell. Thus, in some embodiments, a heterologous gene is a genethat is derived from a species that is different from the species of thefungal cell expressing the gene and recognized anamorphs, teleomorphs ortaxonomic equivalents of the fungal cell expressing the gene. In someembodiments, a heterologous gene is a modified version of a gene that isendogenous to the host fungal cell (e.g., an endogenous gene subjectedto manipulation and then introduced or transformed into the host cell).For example, in some embodiments, a heterologous gene has an endogenouscoding sequence, but has modifications in the promoter sequence.Similarly, in other embodiments, a heterologous gene encodes the sameamino acid sequence as an endogenous gene, but has modifications incodon usage and/or to noncoding regions (e.g., introns), and/orcombinations thereof. For example, in some embodiments, a heterologousgene contains modifications to the coding sequence to encode anon-wild-type polypeptide. As another example, in some embodiments, aheterologous gene has the same promoter sequence, 5′ and 3′ untranslatedregions and coding regions as a parental strain, but is located inanother region of the same chromosome, or on an entirely differentchromosome as compared to a parental strain of the host cell. In someembodiments, the heterologous gene is a gene that has been modified tooverexpress a gene product of interest.

As used herein, “recombinant” includes reference to a cell or vector,that has been modified by the introduction of a heterologous nucleicacid sequence or that the cell is derived from a cell so modified. Thus,for example, recombinant cells express genes that are not found inidentical form within the native (i.e., non-recombinant) form of thecell or express native genes that are otherwise abnormally expressed,under-expressed or not expressed at all as a result of deliberate humanintervention. “Recombinant,” “engineered,” and “non-naturallyoccurring,” when used with reference to a cell, nucleic acid, orpolypeptide, refers to a material, or a material corresponding to thenatural or native form of the material, that has been modified in amanner that would not otherwise exist in nature, or is identical theretobut produced or derived from synthetic materials and/or by manipulationusing recombinant techniques. Non-limiting examples include, amongothers, recombinant cells expressing genes that are not found within thenative (i.e., non-recombinant) form of the cell or express native genesthat are otherwise expressed at a different level. “Recombination,”“recombining,” and “generating a recombined” nucleic acid also encompassthe assembly of two or more nucleic acid fragments wherein the assemblygives rise to a chimeric gene.

As used herein, a “genetically modified” or “genetically engineered”cell is a cell whose genetic material has been altered using geneticengineering techniques. A genetically modified cell also refers to aderivative of or the progeny of a cell whose genetic material has beenaltered using genetic engineering techniques. An example of a geneticmodification as a result of genetic engineering techniques includes amodification to the genomic DNA. Another example of a geneticmodification as a result of genetic engineering techniques includesintroduction of a stable heterologous nucleic acid into the cell. Forexample, in some embodiments, the genetically modified fungal cell ofthe present invention secretes a reduced amount of at least one proteaseor the secreted enzyme has a reduced ability to oxidize cellobiose.

As used herein, the term “overexpression” refers to any state in which agene is caused to be expressed at an elevated rate or level as comparedto the endogenous expression rate or level for that gene. In someembodiments, “overexpression” includes an elevated translation rate orlevel of the gene compared to the endogenous translation rate or levelfor that gene. In some embodiments, overexpression includes an elevatedtranscription rate or level of the gene compared to the endogenoustranscription rate or level for that gene. For example, in someembodiments, a heterologous gene is introduced into a fungal cell toexpress a gene encoding a heterologous enzyme such as a beta-glucosidasefrom another organism. In some other embodiments, a heterologous gene isintroduced into a fungal cell to overexpress a gene encoding ahomologous enzyme such as a beta-glucosidase.

In some embodiments, mutant DNA sequences are generated using sitesaturation mutagenesis in at least one codon. In some other embodiments,site saturation mutagenesis is performed for two or more codons. In somefurther embodiments, mutant DNA sequences have more than about 50%, morethan about 55%, more than about 60%, more than about 65%, more thanabout 70%, more than about 75%, more than about 80%, more than about81%, more than about 82%, more than about 83%, more than about 84%, morethan about 85%, more than about 86%, more than about 87%, more thanabout 88%, more than about 89%, more than about 90%, more than about91%, more than about 92%, more than about 93%, more than about 94%, morethan about 95%, more than about 96%, more than about 97%, more thanabout 98%, or more than about 99% homology with the wild-type sequence.In some alternative embodiments, mutant DNA is generated in vivo usingany suitable known mutagenic procedures including, but not limited tothe use of radiation, nitrosoguanidine, etc. The desired DNA sequence isthen isolated and used in the methods provided herein.

As used herein, the terms “amplification” and “gene amplification” referto a method by which specific DNA sequences are disproportionatelyreplicated such that the amplified gene becomes present in a higher copynumber than was initially present in the genome. In some embodiments,selection of cells by growth in the presence of a drug (e.g., aninhibitor of an inhibitable enzyme) results in the amplification ofeither the endogenous gene encoding the gene product required for growthin the presence of the drug or by amplification of exogenous (i.e.,input) sequences encoding this gene product, or both. “Amplification” isa special case of nucleic acid replication involving templatespecificity. It is to be contrasted with non-specific templatereplication (i.e., replication that is template-dependent but notdependent on a specific template). Template specificity is heredistinguished from fidelity of replication (i.e., synthesis of theproper polynucleotide sequence) and nucleotide (ribo- or deoxyribo-)specificity. Template specificity is frequently described in terms of“target” specificity. Target sequences are “targets” in the sense thatthey are sought to be sorted out from other nucleic acid. Amplificationtechniques have been designed primarily for this sorting out.

As used herein, the term “primer” refers to an oligonucleotide, whetheroccurring naturally as in a purified restriction digest or producedsynthetically, that is capable of acting as a synthesis initiation pointwhen placed under conditions in which synthesis of a primer extensionproduct which is complementary to a nucleic acid strand is induced(i.e., in the presence of nucleotides and an inducing agent such as DNApolymerase and at a suitable temperature and pH). The primer ispreferably single stranded for maximum efficiency in amplification, butmay alternatively be double stranded. If double stranded, the primer isfirst treated to separate its strands before being used to prepareextension products. In some embodiments, the primer is anoligodeoxyribonucleotide. The primer must be sufficiently long to primethe synthesis of extension products in the presence of the inducingagent. As known in the art, the exact lengths of the primers will dependon many factors, including temperature, source of primer and the use ofthe method.

As used herein, the term “probe” refers to an oligonucleotide (i.e., asequence of nucleotides), whether occurring naturally as in a purifiedrestriction digest or produced synthetically, recombinantly or by PCRamplification, that is capable of hybridizing to another oligonucleotideof interest. A probe may be single-stranded or double-stranded. Probesare useful in the detection, identification and isolation of particulargene sequences. It is contemplated that any probe used in the presentinvention will be labeled with any “reporter molecule,” so that isdetectable in any detection system, including, but not limited to enzyme(e.g., ELISA, as well as enzyme-based histochemical assays),fluorescent, radioactive, and luminescent systems. It is not intendedthat the present invention be limited to any particular detection systemor label.

As used herein, the term “target,” when used in reference to thepolymerase chain reaction, refers to the region of nucleic acid boundedby the primers used for polymerase chain reaction. Thus, the “target” issought to be sorted out from other nucleic acid sequences. A “segment”is defined as a region of nucleic acid within the target sequence.

As used herein, the term “polymerase chain reaction” (PCR) refers to themethods of U.S. Pat. No. 4,683,195 4,683,202, and 4,965,188, herebyincorporated by reference, which include methods for increasing theconcentration of a segment of a target sequence in a mixture of genomicDNA without cloning or purification. This method for amplifying thetarget sequence is well known in the art.

As used herein, the term “amplification reagents” refers to thosereagents (deoxyribonucleotide triphosphates, buffer, etc.), needed foramplification except for primers, nucleic acid template and theamplification enzyme. Typically, amplification reagents along with otherreaction components are placed and contained in a reaction vessel (testtube, microwell, etc.).

As used herein, the terms “restriction endonucleases” and “restrictionenzymes” refer to bacterial enzymes, each of which cut double-strandedDNA at or near a specific nucleotide sequence.

A “restriction site” refers to a nucleotide sequence recognized andcleaved by a given restriction endonuclease and is frequently the sitefor insertion of DNA fragments. In some embodiments of the invention,restriction sites are engineered into the selective marker and into 5′and 3′ ends of the DNA construct.

As used herein, “homologous recombination” means the exchange of DNAfragments between two DNA molecules or paired chromosomes at the site ofidentical or nearly identical nucleotide sequences. In some embodiments,chromosomal integration is homologous recombination.

As used herein “amino acid” refers to peptide or protein sequences orportions thereof. The terms “protein,” “peptide,” and “polypeptide” areused interchangeably in reference to a polymer of amino acid residues).The term “amino acid” refers to naturally occurring and synthetic aminoacids, as well as amino acid analogs. Naturally occurring amino acidsare those encoded by the genetic code, as well as those amino acids thatare later modified (e.g., hydroxyproline, γ-carboxyglutamate, and0-phosphoserine). “The term amino acid analogs” refers to compounds thathave the same basic chemical structure as a naturally occurring aminoacid (i.e., an α-carbon that is bound to a hydrogen, a carboxyl group,an amino group, and an R group, such as homoserine, norleucine,methionine sulfoxide, or methionine methyl sulfonium). Such analogs havemodified R groups (e.g., norleucine) or modified peptide backbones, butretain the same basic chemical structure as a naturally occurring aminoacid. Amino acids may be referred to herein by either their commonlyknown three letter symbols or by the one-letter symbols recommended bythe IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides,likewise, may be referred to by their commonly accepted single-lettercodes. It is also understood that a polypeptide may be encoded by morethan one nucleotide sequence, due to the degeneracy of the genetic code.

A used herein, an amino acid or nucleotide base “position” is denoted bya number that sequentially identifies each amino acid (or nucleotidebase) in the reference sequence based on its position relative to theN-terminus (or 5′-end). Due to deletions, insertions, truncations,fusions, and the like that must be taken into account when determiningan optimal alignment, the amino acid residue number in a test sequencedetermined by simply counting from the N-terminus will not necessarilybe the same as the number of its corresponding position in the referencesequence. For example, in a case where a variant has a deletion relativeto an aligned reference sequence, there will be no amino acid in thevariant that corresponds to a position in the reference sequence at thesite of deletion. Where there is an insertion in an aligned referencesequence, that insertion will not correspond to a numbered amino acidposition in the reference sequence. In the case of truncations orfusions there can be stretches of amino acids in either the reference oraligned sequence that do not correspond to any amino acid in thecorresponding sequence.

As used herein, the terms “numbered with reference to” or “correspondingto,” when used in the context of the numbering of a given amino acid orpolynucleotide sequence, refers to the numbering of the residues of aspecified reference sequence when the given amino acid or polynucleotidesequence is compared to the reference sequence.

As used herein, “conservative substitution,” as used with respect toamino acids, refers to the substitution of an amino acid with achemically similar amino acid Amino acid substitutions that do notgenerally alter specific activity are well known in the art and aredescribed in numerous textbooks. The most commonly occurring exchangesare Ala/Ser, Val/Ile, Asp/Glu, Thr/Ser, Ala/Gly, Ala/Thr, Ser/Asn,Ala/Val, Ser/Gly, Tyr/Phe, Ala/Pro, Lys/Arg, Asp/Asn, Leu/Ile, Leu/Val,Ala/Glu, and Asp/Gly, as well as these in reverse. As used herein, aconservative substitute for a residue is another residue in the samegroup as shown below.

basic amino acids arginine (R), lysine (K), histidine (H) acidic aminoacids glutamic acid (E), aspartic acid (D) polar amino acids glutamine(Q), asparagine (N) hydrophobic amino acids leucine (L), isoleucine (I),valine (V) aromatic amino acids phenylalanine (F), tryptophan (W),tyrosine (Y) small amino acids glycine (G), alanine (A), serine (S),threonine (T), proline (P), cysteine (C), methionine (M)

The following nomenclature may be used to describe substitutions in areference sequence relative to a reference sequence or a variantpolypeptide or nucleic acid sequence: “R-#-V,” where “#” refers to theposition in the reference sequence, “R” refers to the amino acid (orbase) at that position in the reference sequence, and “V” refers to theamino acid (or base) at that position in the variant sequence.

The term “amino acid substitution set” or “substitution set” refers to agroup of amino acid substitutions. A substitution set can have 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or more amino acidsubstitutions.

As used herein, “deletion” when used in reference to a polypeptide,refers to modification of the polypeptide by removal of one or moreamino acids from a reference polypeptide. Deletions can comprise removalof 1 or more amino acids, 2 or more amino acids, 3 or more amino acids,4 or more amino acids, 5 or more amino acids, 6 or more amino acids, 7or more amino acids, 8 or more amino acids, 9 or more amino acids, 10 ormore amino acids, 15 or more amino acids, or 20 or more amino acids, upto 10% of the total number of amino acids, or up to 20% of the totalnumber of amino acids making up the polypeptide while retainingenzymatic activity and/or retaining the improved properties of anengineered at least one protease enzyme. Deletions may be present in theinternal portions and/or terminal portions of the polypeptide. In someembodiments, the deletion comprises a continuous segment, while in otherembodiments, it is discontinuous.

As used herein, a “gene deletion” or “deletion mutation” is a mutationin which at least part of a sequence of the DNA making up the gene ismissing. Thus, a “deletion” in reference to nucleic acids is a loss orreplacement of genetic material resulting in a complete or partialdisruption of the sequence of the DNA making up the gene. Any number ofnucleotides can be deleted, from a single base to an entire piece of achromosome. Thus, in some embodiments, the term “deletion” refers to theremoval of a gene necessary for encoding a specific protein (e.g., aprotease). In this case, the strain having this deletion can be referredto as a “deletion strain.” In some embodiments, the Myceliophthora(e.g., M. thermophila) is a deletion strain comprising deletion of atleast one gene encoding at least one protease selected from Protease #1,Protease #2, Protease #3, and/or Protease #4. In some additionalembodiments, the Myceliophthora (e.g., M. thermophila) is a straindescribed in U.S. Pat. No. 8,236,551 and/or WO 2012/061382 (both ofwhich are incorporated herein by reference), comprising deletion and/orinactivation of at least one cdh gene, and further comprising deletionof at least one polynucleotide sequence selected from SEQ ID NOS:1, 3,4, and/or 6. In some embodiments, the Myceliophthora (e.g., M.thermophila) is a deletion strain comprising deletion of at least onepolynucleotide sequence selected from SEQ ID NOS:1, 3, 4, and/or 6. Insome additional embodiments, the Myceliophthora (e.g., M. thermophila)is a strain described in U.S. Pat. No. 8,236,551 and/or WO 2012/061382(both of which are incorporated herein by reference), comprisingdeletion and/or inactivation of at least one cdh gene, and furthercomprising deletion of at least one polynucleotide sequence selectedfrom SEQ ID NOS:1, 3, 4, and/or 6.

As used herein, “gene inactivation” refers to any alteration results ingreatly reduced or the absence of gene expression. The term encompassesany embodiment in which at least one gene is inactivated by any means,including but not limited to deletion, alterations, promoteralterations, antisense RNA, dsRNA, etc. In some embodiments, theMyceliophthora (e.g., M. thermophila) comprises a strain comprisinginactivation of at least one gene encoding at least one proteaseselected from Protease #1, Protease #2, Protease #3, and/or Protease #4.In some embodiments, the Myceliophthora (e.g., M. thermophila) is astrain comprising inactivation of at least one polynucleotide sequenceselected from SEQ ID NOS:1, 3, 4, and/or 6. In some embodiments, theMyceliophthora (e.g., M. thermophila) comprises a strain described inU.S. Pat. No. 8,236,551 and/or WO 2012/061382, comprising deletionand/or inactivation of at least one cdh gene, and further comprisinginactivation of at least one gene encoding at least one proteaseselected from Protease #1, Protease #2, Protease #3, and/or Protease #4.In some additional embodiments, the Myceliophthora (e.g., M.thermophila) is a strain described in U.S. Pat. No. 8,236,551 and/or WO2012/061382 (both of which are incorporated herein by reference),comprising deletion and/or inactivation of at least one cdh gene, andfurther comprising inactivation of at least one polynucleotide sequenceselected from SEQ ID NOS:1, 3, 4, and/or 6.

As used herein, “fragment” refers to a polypeptide that has anamino-terminal and/or carboxy-terminal and/or internal deletion, ascompared to a reference polypeptide, but where the remaining amino acidsequence is identical to the corresponding positions in the referencesequence. Fragments can typically have about 60%, about 65%, about 70%,about 75%, about 80%, about 85%, about 86%, about 87%, about 88%, about89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%,about 96%, about 97%, about 98%, or about 99% of the full-length of atleast one protease polypeptide, for example the polypeptide of SEQ IDNOS:2, 4 and/or 6. In some instances, the sequences of the non-naturallyoccurring and wild-type at least one protease polypeptide disclosedherein include an initiating methionine (M) residue (i.e., M at position1). However, the skilled artisan will recognize that this initiatingmethionine residue can be removed during the course of biologicalprocessing of the enzyme, such as in a host cell or in vitro translationsystem, to generate a mature enzyme lacking the initiating methionineresidue, but otherwise retaining the enzyme's properties. Thus, for eachof the protease polypeptides disclosed herein having an amino acidsequence comprising an initiating methionine, the present disclosurealso encompasses the polypeptide with the initiating methionine residuedeleted (i.e., a fragment of the at least one protease polypeptidelacking a methionine at position 1).

As used herein, the term “biologically active fragment,” refers to apolypeptide that has an amino-terminal and/or carboxy-terminaldeletion(s) and/or internal deletion(s), but where the remaining aminoacid sequence is identical to the corresponding positions in thesequence to which it is being compared (e.g., a full-length protease ofthe present invention) and that retains substantially all of theactivity of the full-length polypeptide. In some embodiments, thebiologically active fragment is a biologically active protease fragment.A biologically active fragment can comprise about 60%, about 65%, about70%, about 75%, about 80%, about 85%, at about 90%, about 91%, about92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%,or about 99% of a full-length protease polypeptide

As used herein, “protein of interest” and “polypeptide of interest”refer to a protein/polypeptide that is desired and/or being assessed. Insome embodiments, the protein of interest is expressed intracellularly,while in other embodiments, it is a secreted polypeptide. In someembodiments, these protein of interest is an enzyme, including but notlimited to the enzymes described herein (e.g., a protease). In someembodiments, the protein of interest is a secreted polypeptide which isfused to a signal peptide (i.e., an amino-terminal extension on aprotein to be secreted). Nearly all secreted proteins use anamino-terminal protein extension which plays a crucial role in thetargeting to and translocation of precursor proteins across themembrane. This extension is proteolytically removed by a signalpeptidase during or immediately following membrane transfer.

A polynucleotide is said to “encode” an RNA or a polypeptide if, in itsnative state or when manipulated by methods known to those of skill inthe art, it can be transcribed and/or translated to produce the RNA, thepolypeptide or a fragment thereof. The anti-sense strand of such anucleic acid is also said to encode the sequences. As is known in theart, DNA can be transcribed by an RNA polymerase to produce RNA, but RNAcan be reverse transcribed by reverse transcriptase to produce a DNA.Thus, a DNA molecule can effectively encode an RNA molecule and viceversa.

As used herein, “host strain” and “host cell” refers to a suitable hostfor an expression vector comprising DNA. The “host cells” used in thepresent invention generally are prokaryotic or eukaryotic hosts whichpreferably have been manipulated by methods known to those skilled inthe art. In some embodiments, host cells are transformed with vectorsconstructed using recombinant DNA techniques. Such transformed hostcells are capable of either replicating vectors encoding proteinvariant(s) and/or expressing the desired protein variant(s). In the caseof vectors which encode the pre- or prepro-form of the protein variant,such variants, when expressed, are typically secreted from the host cellinto the host cell medium.

As used herein, “naturally-occurring enzyme” refers to an enzyme havingthe unmodified amino acid sequence identical to that found in nature(i.e., “wild-type”). Naturally occurring enzymes include native enzymes(i.e., those enzymes naturally expressed or found in the particularmicroorganism).

The terms “wild-type sequence” and “naturally-occurring sequence” areused interchangeably herein, to refer to a polypeptide or polynucleotidesequence that is native or naturally occurring in a host cell. In someembodiments, the wild-type sequence refers to a sequence of interestthat is the starting point of a protein engineering project. Thewild-type sequence may encode either a homologous or heterologousprotein.

As used herein, the terms “isolated” and “purified” refer to a materialthat is removed from its original environment (e.g., the naturalenvironment, if it is naturally occurring). For example, the material issaid to be “purified” when it is present in a particular composition ina higher or lower concentration than exists in a naturally-occurring orwild-type organism or in combination with components not normallypresent upon expression from a naturally-occurring or wild-typeorganism. For example, a naturally-occurring polynucleotide orpolypeptide present in a living animal is not isolated, but the samepolynucleotide or polypeptide, separated from some or all of thecoexisting materials in the natural system, is isolated. In someembodiments, such polynucleotides are part of a vector, and/or suchpolynucleotides or polypeptides are part of a composition, and stillconsidered to be isolated, in that such vector or composition is notpart of its natural environment. In some embodiments, a nucleic acid orprotein is said to be purified, for example, if it gives rise toessentially one band in an electrophoretic gel or blot. In someembodiments, the terms “isolated” and “purified” are used to refer to amolecule (e.g., an isolated nucleic acid, polypeptide, etc.) or othercomponent that is removed from at least one other component with whichit is naturally associated. In some embodiments, the term “isolated”refers to a nucleic acid, polypeptide, or other component that ispartially or completely separated from components with which it isnormally associated in nature. Thus, the term encompasses a substance ina form or environment that does not occur in nature. Non-limitingexamples of isolated substances include, but are not limited to: anynon-naturally occurring substance; any substance including, but notlimited to, any enzyme, variant, polynucleotide, protein, peptide orcofactor, that is at least partially removed from one or more or all ofthe naturally occurring constituents with which it is associated innature; any substance modified by the hand of man relative to thatsubstance found in nature; and/or any substance modified by increasingthe amount of the substance relative to other components with which itis naturally associated (e.g., multiple copies of a gene encoding thesubstance; and/or use of a stronger promoter than the promoter naturallyassociated with the gene encoding the substance). In some embodiments, apolypeptide of interest is used in industrial applications in the formof a fermentation broth product (i.e., the polypeptide is a component ofa fermentation broth) used as a product in industrial applications suchas ethanol production. In some embodiments, in addition to thepolypeptide of interest (e.g., an EG1b polypeptide), the fermentationbroth product further comprises ingredients used in the fermentationprocess (e.g., cells, including the host cells containing the geneencoding the polypeptide of interest and/or the polypeptide ofinterest), cell debris, biomass, fermentation media, and/or fermentationproducts. In some embodiments, the fermentation broth is optionallysubjected to one or more purification steps (e.g., filtration) to removeor reduce at least one components of a fermentation process.Accordingly, in some embodiments, an isolated substance is present insuch a fermentation broth product.

The terms “purification” and “isolation” when used in reference to anenzyme (e.g., at least one protease), mean that the enzyme is alteredfrom its natural state by virtue of separating the enzyme from some orall of the naturally occurring constituents with which it is associatedin nature. This may be accomplished by any suitable art-recognizedseparation technique, including but not limited to ion exchangechromatography, affinity chromatography, hydrophobic separation,dialysis, protease treatment, ammonium sulphate precipitation or otherprotein salt precipitation, centrifugation, size exclusionchromatography, filtration, microfiltration, gel electrophoresis,separation on a gradient or any other suitable methods, to remove wholecells, cell debris, impurities, extraneous proteins, or enzymesundesired in the final composition. It is further possible to then addconstituents to an enzyme-containing composition which provideadditional benefits, for example, activating agents, anti-inhibitionagents, desirable ions, compounds to control pH, other enzymes, etc.

The term “isolated,” when used in reference to a DNA sequence, refers toa DNA sequence that has been removed from its natural genetic milieu andis thus free of other extraneous or unwanted coding sequences, and is ina form suitable for use within genetically engineered protein productionsystems. Such isolated molecules are those that are separated from theirnatural environment and include cDNA and genomic clones. Isolated DNAmolecules of the present invention are free of other genes with whichthey are ordinarily associated, but may include naturally occurring 5′and 3′ untranslated regions (e.g., promoters and terminators). Theidentification of associated regions will be evident to one of ordinaryskill in the art (See e.g., Dynan and Tijan, Nature 316:774-78 [1985]).The term “an isolated DNA sequence” is alternatively referred to as “acloned DNA sequence.”

The term “isolated,” when used in reference to a protein, refers to aprotein that is found in a condition other than its native environment.In some embodiments, the isolated protein is substantially free of otherproteins, particularly other homologous proteins. An isolated protein ismore than about 10% pure, preferably more than about 20% pure, and evenmore preferably more than about 30% pure, as determined by SDS-PAGE.Further aspects of the invention encompass the protein in a highlypurified form (i.e., more than about 40% pure, more than about 50% pure,more than about 55% pure, more than about 60% pure, more than about 65%pure, more than about 70% pure, more than about 75% pure, more thanabout 80% pure, more than about 85% pure, more than about 90% pure, morethan about 95% pure, more than about 96% pure, more than about 97% pure,more than about 98% pure, or even more than about 99% pure), asdetermined by SDS-PAGE.

As used herein, the phrase “substantially pure polypeptide” refers to acomposition in which the polypeptide species is the predominant speciespresent (i.e., on a molar or weight basis, it is more abundant than anyother individual macromolecular species in the composition), and isgenerally a substantially purified composition when the object speciescomprises at least about 50 percent of the macromolecular speciespresent by mole or % weight. Generally, a substantially pure enzymecomposition will comprise about 60% or more, about 65% or more, about70% or more, about 75% or more, about 80% or more, about 85% or more,about 90% or more, about 95% or more, about 96% or more, about 97% ormore, about 98%, or about 99% or more, or more of all macromolecularspecies by mole or percent weight present in the composition. Solventspecies, small molecules (<500 Daltons), and elemental ion species arenot considered macromolecular species.

As used herein, the term “starting gene” refers to a gene of interestthat encodes a protein of interest that is to be improved, deleted,mutated, and/or otherwise changed using the present invention.

The term “property” and grammatical equivalents thereof in the contextof a nucleic acid, as used herein, refer to any characteristic orattribute of a nucleic acid that can be selected or detected. Theseproperties include, but are not limited to, a property affecting bindingto a polypeptide, a property conferred on a cell comprising a particularnucleic acid, a property affecting gene transcription (e.g., promoterstrength, promoter recognition, promoter regulation, and/or enhancerfunction), a property affecting RNA processing (e.g., RNA splicing, RNAstability, RNA conformation, and/or post-transcriptional modification),a property affecting translation (e.g., level, regulation, binding ofmRNA to ribosomal proteins, and/or post-translational modification). Forexample, a binding site for a transcription factor, polymerase,regulatory factor, etc., of a nucleic acid may be altered to producedesired characteristics or to identify undesirable characteristics.

The term “property” and grammatical equivalents thereof in the contextof a polypeptide (including proteins), as used herein, refer to anycharacteristic or attribute of a polypeptide that can be selected ordetected. These properties include, but are not limited to oxidativestability, substrate specificity, catalytic activity, thermal stability,alkaline stability, pH activity profile, resistance to proteolyticdegradation, k_(m), k_(cat), k_(cat)/k_(m) ratio, protein folding,inducing an immune response, not inducing an immune response, ability tobind to a ligand, ability to bind to a receptor, ability to be secreted,ability to be displayed on the surface of a cell, ability tooligomerize, ability to signal, ability to stimulate cell proliferation,ability to inhibit cell proliferation, ability to induce apoptosis,ability to be modified by phosphorylation or glycosylation, and/orability to treat disease, etc. Indeed, it is not intended that thepresent invention be limited to any particular property.

As used herein, the term “screening” has its usual meaning in the artand is, in general a multi-step process. In the first step, a mutantnucleic acid or variant polypeptide is provided. In the second step, aproperty of the mutant nucleic acid or variant polypeptide isdetermined. In the third step, the determined property is compared to aproperty of the corresponding precursor nucleic acid, to the property ofthe corresponding naturally occurring polypeptide or to the property ofthe starting material (e.g., the initial sequence) for the generation ofthe mutant nucleic acid. It will be apparent to the skilled artisan thatthe screening procedure for obtaining a nucleic acid or protein with analtered property depends upon the property of the starting material, andthe modification of which the generation of the mutant nucleic acid isintended to facilitate. The skilled artisan will therefore appreciatethat the invention is not limited to any specific property to bescreened for and that the following description of properties listsillustrative examples only. Methods for screening for any particularproperty are generally described in the art. For example, one canmeasure binding, pH optima, specificity, etc., before and aftermutation, wherein a change indicates an alteration. In some embodiments,the screens are performed in a high-throughput manner, includingmultiple samples being screened simultaneously, including, but notlimited to assays utilizing chips, phage display, multiple substratesand/or indicators, and/or any other suitable method known in the art. Asused in some embodiments, screens encompass selection steps in whichvariants of interest are enriched from a population of variants. It isintended that the term encompass any suitable means for selection.Indeed, it is not intended that the present invention be limited to anyparticular method of screening.

As used herein, the term “targeted randomization” refers to a processthat produces a plurality of sequences where one or several positionshave been randomized. In some embodiments, randomization is complete(i.e., all four nucleotides, A, T, G, and C can occur at a randomizedposition). In some alternative embodiments, randomization of anucleotide is limited to a subset of the four nucleotides. Targetedrandomization can be applied to one or several codons of a sequence,coding for one or several proteins of interest. When expressed, theresulting libraries produce protein populations in which one or moreamino acid positions can contain a mixture of all 20 amino acids or asubset of amino acids, as determined by the randomization scheme of therandomized codon. In some embodiments, the individual members of apopulation resulting from targeted randomization differ in the number ofamino acids, due to targeted or random insertion or deletion of codons.In some further embodiments, synthetic amino acids are included in theprotein populations produced. In some additional embodiments, themajority of members of a population resulting from targetedrandomization show greater sequence homology to the consensus sequencethan the starting gene. In some embodiments, the sequence encodes one ormore proteins of interest. In some alternative embodiments, the proteinshave differing biological functions.

The terms “modified nucleic acid sequence” and “modified genes” are usedinterchangeably herein to refer to a nucleic acid sequence that includesa deletion, insertion, substitution or any other change and/orinterruption of the naturally occurring nucleic acid sequence. In someembodiments, the expression product of the modified sequence is atruncated protein (e.g., if the modification is a deletion orinterruption in the sequence). In some embodiments, the truncatedprotein retains biological activity. In some alternative embodiments,the expression product of the modified sequence is an elongated protein(e.g., modifications comprising an insertion into the nucleic acidsequence). In some further embodiments, an insertion leads to atruncated protein (e.g., when the insertion results in the formation ofa stop codon). Thus, an insertion may result in either a truncatedprotein or an elongated protein as an expression product.

As used herein, the terms “mutant nucleic acid sequence,” “mutantnucleotide sequence,” and “mutant gene” are used interchangeably inreference to a nucleotide sequence that has an alteration in at leastone codon occurring in a host cell's wild-type nucleotide sequence. Theexpression product of the mutant sequence is a protein with an alteredamino acid sequence relative to the wild-type. In some embodiments, theexpression product has an altered functional capacity (e.g., enhancedenzymatic activity).

As used herein, the term “degenerate codon” refers to a codon used torepresent a set of different codons (also referred to as an “ambiguouscodon”). For example, the degenerate codon “NNT” represents a set of 16codons having the base triplet sequence (A, C, T, or G)/(A, C, T, orG)/T.

As used herein, “coding sequence” refers to that portion of apolynucleotide that encodes an amino acid sequence of a protein (e.g., agene).

As used herein, the term “antibodies” refers to immunoglobulins.Antibodies include but are not limited to immunoglobulins obtaineddirectly from any species from which it is desirable to obtainantibodies. In addition, the present invention encompasses modifiedantibodies. The term also refers to antibody fragments that retain theability to bind to the epitope that the intact antibody binds andincludes polyclonal antibodies, monoclonal antibodies, chimericantibodies, anti-idiotype (anti-ID) antibodies. Antibody fragmentsinclude, but are not limited to the complementarity-determining regions(CDRs), single-chain fragment variable regions (scFv), heavy chainvariable region (VH), and light chain variable region (VL) fragments.

As used herein, the term “oxidation stable” refers to enzymes of thepresent invention that retain a specified amount of enzymatic activityover a given period of time under conditions prevailing during the useof the invention, for example while exposed to or contacted withoxidizing agents. In some embodiments, the enzymes retain at least about50%, about 60%, about 70%, about 75%, about 80%, about 85%, about 90%,about 92%, about 95%, about 96%, about 97%, about 98%, or about 99%enzymatic activity after contact with an oxidizing agent over a giventime period, for example, at least about 1 minute, about 3 minutes,about 5 minutes, about 8 minutes, about 12 minutes, about 16 minutes,about 20 minutes, etc.

As used herein, the terms “thermally stable” and “thermostable” refer toenzymes of the present invention that retain a specified amount ofenzymatic activity after exposure to identified temperatures over agiven period of time under conditions prevailing during the use of theenzyme, for example, when exposed to altered temperatures. “Alteredtemperatures” include increased or decreased temperatures. In someembodiments, the enzymes retain at least about 50%, about 60%, about70%, about 75%, about 80%, about 85%, about 90%, about 92%, about 95%,about 96%, about 97%, about 98%, or about 99% enzymatic activity afterexposure to altered temperatures over a given time period, for example,at least about 60 minutes, about 120 minutes, about 180 minutes, about240 minutes, about 300 minutes, etc.

As used herein, the term “thermophilic fungus” refers to any funguswhich exhibits optimum growth at a temperature of at least about 35° C.,and generally below about 100° C., such as for example between about 35°C. to about 80° C., between about 35° C. to about 75° C., between about40° C. to about 65° C., or between about 40° C. to about 60° C.Typically, the optimum growth is exhibited at a temperature of at leastabout 35° to about 60° C.

As used herein, “solvent stable” refers to a polypeptide that maintainssimilar activity (more than for example, about 60% to about 80%) afterexposure to varying concentrations (e.g., about 5 to about 99%) of anon-aqueous solvent (e.g., isopropyl alcohol, tetrahydrofuran,2-methyltetrahydrofuran, acetone, toluene, butylacetate, methyltert-butylether, etc.) for a period of time (e.g., about 0.5 to about 24hrs) compared to a reference polypeptide.

As used herein, “pH stable” refers to a polypeptide that maintainssimilar activity (more than for example, about 60% to about 80%) afterexposure to low or high pH (e.g., about 4.5 to about 6, or about 8 toabout 12) for a period of time (e.g., 0.5-24 hrs) compared to areference polypeptide.

As used herein, the term “enhanced stability” in the context of anoxidation, chelator, thermal and/or pH stable enzyme refers to a higherretained enzymatic activity over time as compared to other enzymesand/or wild-type enzymes.

As used herein, the term “diminished stability” in the context of anoxidation, chelator, thermal and/or pH stable enzyme refers to a lowerretained enzymatic activity over time as compared to other enzymesand/or wild-type enzymes.

As used herein, “secreted activity” refers to enzymatic activity of atleast one protease enzymes produced by a fungal cell that is present inan extracellular environment. An extracellular environment can be, forexample, an extracellular milieu such as a culture medium. The secretedactivity is influenced by the total amount of at least one proteasesecreted, and also is influenced by the catalytic efficiency of thesecreted at least one protease.

As used herein, a “protease that is secreted by a cell” is a proteaseproduced by the cell in a manner such that the protease is exportedacross the cell membrane and then subsequently released into theextracellular milieu, such as into culture media.

As used herein, the term “culturing” refers to growing a population ofmicrobial cells under suitable conditions in a liquid or solid medium.

The terms “biomass,” and “biomass substrate,” encompass any suitablematerials for use in saccharification reactions. The terms encompass,but are not limited to materials that comprise cellulose (i.e.,“cellulosic biomass,” “cellulosic feedstock,” and “cellulosicsubstrate”). Biomass can be derived from plants, animals, ormicroorganisms, and may include, but is not limited to agricultural,industrial, and forestry residues, industrial and municipal wastes, andterrestrial and aquatic crops grown for energy purposes. Examples ofbiomass substrates include, but are not limited to, wood, wood pulp,paper pulp, corn fiber, corn grain, corn cobs, crop residues such ascorn husks, corn stover, grasses, wheat, wheat straw, barley, barleystraw, hay, rice, rice straw, switchgrass, waste paper, paper and pulpprocessing waste, woody or herbaceous plants, fruit or vegetable pulp,distillers grain, grasses, rice hulls, cotton, hemp, flax, sisal, sugarcane bagasse, sorghum, soy, switchgrass, components obtained frommilling of grains, trees, branches, roots, leaves, wood chips, sawdust,shrubs and bushes, vegetables, fruits, and flowers and any suitablemixtures thereof. In some embodiments, the biomass comprises, but is notlimited to cultivated crops (e.g., grasses, including C4 grasses, suchas switch grass, cord grass, rye grass, miscanthus, reed canary grass,or any combination thereof), sugar processing residues, for example, butnot limited to, bagasse (e.g., sugar cane bagasse, beet pulp [e.g.,sugar beet], or a combination thereof), agricultural residues (e.g.,soybean stover, corn stover, corn fiber, rice straw, sugar cane straw,rice, rice hulls, barley straw, corn cobs, wheat straw, canola straw,oat straw, oat hulls, corn fiber, hemp, flax, sisal, cotton, or anycombination thereof), fruit pulp, vegetable pulp, distillers' grains,forestry biomass (e.g., wood, wood pulp, paper pulp, recycled wood pulpfiber, sawdust, hardwood, such as aspen wood, softwood, or a combinationthereof). Furthermore, in some embodiments, the biomass comprisescellulosic waste material and/or forestry waste materials, including butnot limited to, paper and pulp processing waste, municipal paper waste,newsprint, cardboard and the like. In some embodiments, biomasscomprises one species of fiber, while in some alternative embodiments,the biomass comprises a mixture of fibers that originate from differentbiomasses. In some embodiments, the biomass may also comprise transgenicplants that express ligninase and/or cellulase enzymes (See e.g., US2008/0104724 A1).

A biomass substrate is said to be “pretreated” when it has beenprocessed by some physical and/or chemical means to facilitatesaccharification. As described further herein, in some embodiments, thebiomass substrate is “pretreated,” or treated using methods known in theart, such as chemical pretreatment (e.g., ammonia pretreatment, diluteacid pretreatment, dilute alkali pretreatment, or solvent exposure),physical pretreatment (e.g., steam explosion or irradiation), mechanicalpretreatment (e.g., grinding or milling) and biological pretreatment(e.g., application of lignin-solubilizing microorganisms) andcombinations thereof, to increase the susceptibility of cellulose tohydrolysis. Thus, the term “biomass” encompasses any living or deadbiological material that contains a polysaccharide substrate, includingbut not limited to cellulose, starch, other forms of long-chaincarbohydrate polymers, and mixtures of such sources. It may or may notbe assembled entirely or primarily from glucose or xylose, and mayoptionally also contain various other pentose or hexose monomers. Xyloseis an aldopentose containing five carbon atoms and an aldehyde group. Itis the precursor to hemicellulose, and is often a main constituent ofbiomass. In some embodiments, the substrate is slurried prior topretreatment. In some embodiments, the consistency of the slurry isbetween about 2% and about 30% and more typically between about 4% andabout 15%. In some embodiments, the slurry is subjected to a waterand/or acid soaking operation prior to pretreatment. In someembodiments, the slurry is dewatered using any suitable method to reducesteam and chemical usage prior to pretreatment. Examples of dewateringdevices include, but are not limited to pressurized screw presses (Seee.g., WO 2010/022511, incorporated herein by reference) pressurizedfilters and extruders.

In some embodiments, the pretreatment is carried out to hydrolyzehemicellulose, and/or a portion thereof present in the cellulosicsubstrate to monomeric pentose and hexose sugars (e.g., xylose,arabinose, mannose, galactose, and/or any combination thereof). In someembodiments, the pretreatment is carried out so that nearly completehydrolysis of the hemicellulose and a small amount of conversion ofcellulose to glucose occurs. In some embodiments, an acid concentrationin the aqueous slurry from about 0.02% (w/w) to about 2% (w/w), or anyamount therebetween, is typically used for the treatment of thecellulosic substrate. Any suitable acid finds use in these methods,including but not limited to, hydrochloric acid, nitric acid, and/orsulfuric acid. In some embodiments, the acid used during pretreatment issulfuric acid. Steam explosion is one method of performing acidpretreatment of biomass substrates (See e.g., U.S. Pat. No. 4,461,648).Another method of pretreating the slurry involves continuouspretreatment (i.e., the cellulosic biomass is pumped though a reactorcontinuously). This methods are well-known to those skilled in the art(See e.g., U.S. Pat. No. 7,754,457).

In some embodiments, alkali is used in the pretreatment. In contrast toacid pretreatment, pretreatment with alkali may not hydrolyze thehemicellulose component of the biomass. Rather, the alkali reacts withacidic groups present on the hemicellulose to open up the surface of thesubstrate. In some embodiments, the addition of alkali alters thecrystal structure of the cellulose so that it is more amenable tohydrolysis. Examples of alkali that find use in the pretreatmentinclude, but are not limited to ammonia, ammonium hydroxide, potassiumhydroxide, and sodium hydroxide. One method of alkali pretreatment isAmmonia Freeze Explosion, Ammonia Fiber Explosion or Ammonia FiberExpansion (“AFEX” process; See e.g., U.S. Pat. Nos. 5,171,592;5,037,663; 4,600,590; 6,106,888; 4,356,196; 5,939,544; 6,176,176;5,037,663 and 5,171,592). During this process, the cellulosic substrateis contacted with ammonia or ammonium hydroxide in a pressure vessel fora sufficient time to enable the ammonia or ammonium hydroxide to alterthe crystal structure of the cellulose fibers. The pressure is thenrapidly reduced, which allows the ammonia to flash or boil and explodethe cellulose fiber structure. In some embodiments, the flashed ammoniais then recovered using methods known in the art. In some alternativemethods, dilute ammonia pretreatment is utilized. The dilute ammoniapretreatment method utilizes more dilute solutions of ammonia orammonium hydroxide than AFEX (See e.g., WO2009/045651 and US2007/0031953). This pretreatment process may or may not produce anymonosaccharides.

An additional pretreatment process for use in the present inventionincludes chemical treatment of the cellulosic substrate with organicsolvents, in methods such as those utilizing organic liquids inpretreatment systems (See e.g., U.S. Pat. No. 4,556,430; incorporatedherein by reference). These methods have the advantage that the lowboiling point liquids easily can be recovered and reused. Otherpretreatments, such as the Organosolv™ process, also use organic liquids(See e.g., U.S. Pat. No. 7,465,791, which is also incorporated herein byreference). Subjecting the substrate to pressurized water may also be asuitable pretreatment method (See e.g., Weil et al. (1997) Appl.Biochem. Biotechnol., 68(1-2): 21-40 [1997], which is incorporatedherein by reference). In some embodiments, the pretreated cellulosicbiomass is processed after pretreatment by any of several steps, such asdilution with water, washing with water, buffering, filtration, orcentrifugation, or any combination of these processes, prior toenzymatic hydrolysis, as is familiar to those skilled in the art. Thepretreatment produces a pretreated feedstock composition (e.g., a“pretreated feedstock slurry”) that contains a soluble componentincluding the sugars resulting from hydrolysis of the hemicellulose,optionally acetic acid and other inhibitors, and solids includingunhydrolyzed feedstock and lignin. In some embodiments, the solublecomponents of the pretreated feedstock composition are separated fromthe solids to produce a soluble fraction. In some embodiments, thesoluble fraction, including the sugars released during pretreatment andother soluble components (e.g., inhibitors), is then sent tofermentation. However, in some embodiments in which the hemicellulose isnot effectively hydrolyzed during the pretreatment one or moreadditional steps are included (e.g., a further hydrolysis step(s) and/orenzymatic treatment step(s) and/or further alkali and/or acid treatment)to produce fermentable sugars. In some embodiments, the separation iscarried out by washing the pretreated feedstock composition with anaqueous solution to produce a wash stream and a solids stream comprisingthe unhydrolyzed, pretreated feedstock. Alternatively, the solublecomponent is separated from the solids by subjecting the pretreatedfeedstock composition to a solids-liquid separation, using any suitablemethod (e.g., centrifugation, microfiltration, plate and framefiltration, cross-flow filtration, pressure filtration, vacuumfiltration, etc.). Optionally, in some embodiments, a washing step isincorporated into the solids-liquids separation. In some embodiments,the separated solids containing cellulose, then undergo enzymatichydrolysis with cellulase enzymes in order to convert the cellulose toglucose. In some embodiments, the pretreated feedstock composition isfed into the fermentation process without separation of the solidscontained therein. In some embodiments, the unhydrolyzed solids aresubjected to enzymatic hydrolysis with cellulase enzymes to convert thecellulose to glucose after the fermentation process. In someembodiments, the pretreated cellulosic feedstock is subjected toenzymatic hydrolysis with cellulase enzymes.

Lignocellulose (also “lignocellulosic biomass”) comprises a matrix ofcellulose, hemicellulose and lignin. Economic production of biofuelsfrom lignocellulosic biomass typically involves conversion of thecellulose and hemicellulose components to fermentable sugars, typicallymonosaccharides such as glucose (from the cellulose) and xylose andarabinose (from the hemicelluloses). Nearly complete conversion can beachieved by a chemical pretreatment of the lignocellulose followed byenzymatic hydrolysis with cellulase enzymes. The chemical pretreatmentstep renders the cellulose more susceptible to enzymatic hydrolysis andin some cases, also hydrolyzes the hemicellulose component. Numerouschemical pretreatment processes known in the art find use in the presentinvention, and include, but are not limited to, mild acid pretreatmentat high temperatures and dilute acid, ammonium pretreatment and/ororganic solvent extraction.

Lignin is a more complex and heterogeneous biopolymer than eithercellulose or hemicellulose and comprises a variety of phenolic subunits.Enzymatic lignin depolymerization can be accomplished by ligninperoxidases, manganese peroxidases, laccases, esterases, and/orcellobiose dehydrogenases (CDH), often working in synergy. However, asthe name suggests, CDH enzymes also oxidize cellobiose tocellobionolactone. Several reports indicate that the oxidation ofcellobiose by CDH enhances the rate of cellulose hydrolysis bycellulases by virtue of reducing the concentrations of cellobiose, whichis a potent inhibitor of some cellulase components (See e.g., Mansfieldet al., Appl. Environ. Microbiol., 63: 3804-3809 [1997]; and Igarishi etal., Eur. J. Biochem., 253:101-106 [1998]). Recently, it has beenreported that CDHs can enhance the activity of cellulolytic enhancingproteins from Glycosyl Hydrolase family 61 (See e.g., WO2010/080532A1).

Thus, as used herein, the term “lignocellulosic biomass” refers to anyplant biomass comprising cellulose and hemicellulose, bound to lignin.In some embodiments, the biomass may optionally be pretreated toincrease the susceptibility of cellulose to hydrolysis by chemical,physical and biological pretreatments (such as steam explosion, pulping,grinding, acid hydrolysis, solvent exposure, and the like, as well ascombinations thereof). Various lignocellulosic feedstocks find use,including those that comprise fresh lignocellulosic feedstock, partiallydried lignocellulosic feedstock, fully dried lignocellulosic feedstock,and/or any combination thereof. In some embodiments, lignocellulosicfeedstocks comprise cellulose in an amount greater than about 20%, morepreferably greater than about 30%, more preferably greater than about40% (w/w). For example, in some embodiments, the lignocellulosicmaterial comprises from about 20% to about 90% (w/w) cellulose, or anyamount therebetween, although in some embodiments, the lignocellulosicmaterial comprises less than about 19%, less than about 18%, less thanabout 17%, less than about 16%, less than about 15%, less than about14%, less than about 13%, less than about 12%, less than about 11%, lessthan about 10%, less than about 9%, less than about 8%, less than about7%, less than about 6%, or less than about 5% cellulose (w/w).Furthermore, in some embodiments, the lignocellulosic feedstockcomprises lignin in an amount greater than about 10%, more typically inan amount greater than about 15% (w/w). In some embodiments, thelignocellulosic feedstock comprises small amounts of sucrose, fructoseand/or starch. The lignocellulosic feedstock is generally firstsubjected to size reduction by methods including, but not limited to,milling, grinding, agitation, shredding, compression/expansion, or othertypes of mechanical action. Size reduction by mechanical action can beperformed by any type of equipment adapted for the purpose, for example,but not limited to, hammer mills, tub-grinders, roll presses, refinersand hydrapulpers. In some embodiments, at least 90% by weight of theparticles produced from the size reduction have lengths less thanbetween about 1/16 and about 4 in (the measurement may be a volume or aweight average length). In some embodiments, the equipment used toreduce the particle size reduction is a hammer mill or shredder.Subsequent to size reduction, the feedstock is typically slurried inwater, as this facilitates pumping of the feedstock. In someembodiments, lignocellulosic feedstocks of particle size less than about6 inches do not require size reduction.

As used herein, the term “lignocellulosic feedstock” refers to any typeof lignocellulosic biomass that is suitable for use as feedstock insaccharification reactions.

As used herein, the term “pretreated lignocellulosic feedstock,” refersto lignocellulosic feedstocks that have been subjected to physicaland/or chemical processes to make the fiber more accessible and/orreceptive to the actions of cellulolytic enzymes, as described above.

As used herein, the term “recovered” refers to the harvesting,isolating, collecting, or recovering of protein from a cell and/orculture medium. In the context of saccharification, it is used inreference to the harvesting the fermentable sugars produced during thesaccharification reaction from the culture medium and/or cells. In thecontext of fermentation, it is used in reference to harvesting thefermentation product from the culture medium and/or cells. Thus, aprocess can be said to comprise “recovering” a product of a reaction(such as a soluble sugar recovered from saccharification) if the processincludes separating the product from other components of a reactionmixture subsequent to at least some of the product being generated inthe reaction.

As used herein, the term “slurry” refers to an aqueous solution in whichare dispersed one or more solid components, such as a cellulosicsubstrate.

As used herein, the term “saccharification” refers to the process inwhich substrates (e.g., cellulosic biomass) are broken down via theaction of cellulases to produce fermentable sugars (e.g. monosaccharidessuch as but not limited to glucose).

As used herein, the term “fermentable sugars” refers to simple sugars(e.g., monosaccharides, disaccharides and short oligosaccharides),including but not limited to glucose, xylose, galactose, arabinose,mannose and sucrose. Indeed, a fermentable sugar is any sugar that amicroorganism can utilize or ferment.

As used herein the term “soluble sugars” refers to water-soluble hexosemonomers and oligomers of up to about six monomer units.

As used herein, the term “fermentation” is used broadly to refer to thecultivation of a microorganism or a culture of microorganisms that usesimple sugars, such as fermentable sugars, as an energy source to obtaina desired product.

As used herein, the term “fermenting organism” refers to any organism,including bacterial and fungal organisms such as yeast and filamentousfungi, suitable for producing at least one desired end product.Especially suitable fermenting organisms are able to ferment (i.e.,convert) sugars, such as glucose, fructose, maltose, xylose, mannoseand/or arabinose, directly or indirectly into a desired end product.

As used herein, the term “cellodextrin” refers to a glucose polymer ofvarying length (i.e., comprising at least two glucose monomers). Eachglucose monomer is linked via a beta-1,4 glycosidic bond. A cellodextrinis classified by its degree of polymerization (DP), which indicates thenumber of glucose monomers the cellodextrin contains. The most commoncellodextrins are: cellobiose (DP=2); cellotriose (DP=3); cellotetrose(DP=4); cellopentose (DP=5); and cellohexose (DP=6). In someembodiments, cellodextrins have a DP of 2-6 (i.e., cellobiose,cellotriose, cellotetrose, cellopentose, and/or cellohexose). In someembodiments, cellodextrins have a DP greater than 6. The degree ofpolymerization of cellodextin molecules can be measured (e.g., by massspectrometry, including but not limited to matrix-assisted laserdesorption/ionization (MALDI) mass spectrometry and electrosprayionization ion trap (ESI-IT) mass spectrometry). Methods of measuringthe degree of polymerization of cellodextrin molecules are known in theart (See e.g., Melander et al., Biomacromol., 7:1410-1421 [2006]).

As used herein, the term “cellulase” refers to a category of enzymescapable of hydrolyzing cellulose (e.g., beta-1,4-glucan orbeta-D-glucosidic linkages) to shorter cellulose chains,oligosaccharides, cellobiose and/or glucose. Cellulases, as known in theart and as described herein, are typically found in a mixture ofdifferent types of cellulolytic enzymes. In some embodiments,“cellulase” includes hemicellulose-hydrolyzing enzymes such asendoxylanase, beta-xylosidase, arabinofuranosidase, alpha-glucuronidase,acetylxylan esterase, feruloyl esterase, alpha-glucuronyl esterase, etc.A “cellulase-producing fungal cell” is a fungal cell that expresses andsecretes at least one cellulose hydrolyzing enzyme. In some embodiments,the cellulase-producing fungal cells express and secrete a mixture ofcellulose hydrolyzing enzymes. “Cellulolytic,” “cellulose hydrolyzing,”“cellulose degrading,” and similar terms refer to cellulase enzymes suchas endoglucanases, cellobiohydrolases (the latter are also referred toas “exoglucanases”), and beta-glucosidases (also known as “cellobiases”)that act synergistically to break down the cellulose first to solubledi- or oligosaccharides such as cellobiose, which are then furtherhydrolyzed to glucose by beta-glucosidase. “Cellulases” typicallycomprise a mixture of different types of cellulolytic enzymes (e.g.,endoglucanases, beta-glucosidases and cellobiohydrolases, the latter arealso referred to as “exoglucanases”) that act synergistically to breakdown the cellulose to soluble di- or oligosaccharides such ascellobiose, which are then further hydrolyzed to glucose bybeta-glucosidase. Cellulase enzymes are produced by a wide variety ofmicroorganisms. Cellulases, as well as hemicellulases from filamentousfungi and some bacteria are widely exploited for many industrialapplications that involve processing of natural fibers to sugars.

Among the cellulase-producing filamentous fungi, there are those thatalso produce a variety of enzymes involved in lignin degradation. Forexample, organisms of such genera as Myceliophthora, Chrysosporium,Sporotrichum, Thielavia, Phanerochaete, Trichoderma and Trametes produceand secrete a mixture of cellulases, hemicellulases and lignin degradingenzymes. These types of organisms are commonly called “white rot fungi”by virtue of their ability to digest lignin and to distinguish them fromthe “brown rot” fungi (such as Trichoderma) which typically cannotdigest lignin.

As used herein, the terms “cellobiose dehydrogenase” and “CDH” refer toa cellobiose:acceptor 1-oxidoreductase that catalyzes the conversion ofcellobiose in the presence of an acceptor to cellobiono-1,5-lactone anda reduced acceptor. Examples of cellobiose dehydrogenases are includedin the enzyme classification (E.C. 1.1.99.18).

As used herein, the term “endoglucanase” or “EG” refers to a class ofcellulases (E.C.3.2.1.4) that hydrolyze internal beta-1,4 glucosidiclinkages in cellulose. The term “endoglucanase” refers to anendo-1,4-(1,3;1,4)-beta-D-glucan 4-glucanohydrolase (E.C. 3.2.1.4),which catalyses endohydrolysis of 1,4-beta-D-glycosidic linkages incellulose, cellulose derivatives (such as carboxymethyl cellulose andhydroxyethyl cellulose), lichenan, beta-1,4 bonds in mixed beta-1,3glucans such as cereal beta-D-glucans or xyloglucans, and other plantmaterial containing cellulosic components. Endoglucanase activity can bedetermined based on a reduction in substrate viscosity or increase inreducing ends determined by a reducing sugar assay (See e.g., Zhang etal., Biotechnol. Adv., 24: 452-481 [2006]). In some embodiments,endoglucanase activity is determined using carboxymethyl cellulose (CMC)hydrolysis (See e.g., Ghose, Pure Appl. Chem., 59: 257-268 [1987]).

As used herein, “EG1” refers to a carbohydrate active enzyme expressedfrom a nucleic acid sequence coding for a glycohydrolase (GH) Family 7catalytic domain classified under EC 3.2.1.4 or any protein, polypeptideor catalytically active fragment thereof. In some embodiments, the EG1is functionally linked to a carbohydrate binding module (CBM), such as aFamily 1 cellulose binding domain. In some embodiments, the EG1 enzymeis EG1b.

As used herein, the term “EG2” refers to a carbohydrate active enzymeexpressed from a nucleic acid sequence coding for a glycohydrolase (GH)Family 5 catalytic domain classified under EC 3.2.1.4 or any protein,polypeptide or catalytically active fragment thereof. In someembodiments, the EG2 is functionally linked to a carbohydrate bindingmodule (CBM), such as a Family 1 cellulose binding domain.

As used herein, the term “EG3” refers to a carbohydrate active enzymeexpressed from a nucleic acid sequence coding for a glycohydrolase (GH)Family 12 catalytic domain classified under EC 3.2.1.4 or any protein,polypeptide or catalytically active fragment thereof. In someembodiments, the EG3 is functionally linked to a carbohydrate bindingmodule (CBM), such as a Family 1 cellulose binding domain.

As used herein, the term “EG4” refers to a carbohydrate active enzymeexpressed from a nucleic acid sequence coding for a glycohydrolase (GH)Family 61 catalytic domain classified under EC 3.2.1.4 or any protein,polypeptide or fragment thereof. In some embodiments, the EG4 isfunctionally linked to a carbohydrate binding module (CBM), such as aFamily 1 cellulose binding domain.

As used herein, the term “EG5” refers to a carbohydrate active enzymeexpressed from a nucleic acid sequence coding for a glycohydrolase (GH)Family 45 catalytic domain classified under EC 3.2.1.4 or any protein,polypeptide or fragment thereof. In some embodiments, the EG5 isfunctionally linked to a carbohydrate binding module (CBM), such as aFamily 1 cellulose binding domain.

As used herein, the term “EG6” refers to a carbohydrate active enzymeexpressed from a nucleic acid sequence coding for a glycohydrolase (GH)Family 6 catalytic domain classified under EC 3.2.1.4 or any protein,polypeptide or fragment thereof. In some embodiments, the EG6 isfunctionally linked to a carbohydrate binding module (CBM), such as aFamily 1 cellulose binding domain.

As used herein, the terms “cellobiohydrolase” and “CBH” are definedherein as a 1,4-beta-D-glucan cellobiohydrolase (E.C. 3.2.1.91), whichcatalyzes the hydrolysis of 1,4-beta-D-glucosidic linkages in cellulose,cellooligosaccharides, or any beta-1,4-linked glucose containingpolymer, releasing cellobiose from the reducing or non-reducing ends ofthe chain (See e.g., Teeri, Trends Biotechnol., 15:160-167 [1997]; andTeeri et al., Biochem. Soc. Trans., 26: 173-178 [1998]). In someembodiments, cellobiohydrolase activity is determined using afluorescent disaccharide derivative4-methylumbelliferyl-.beta.-D-lactoside (See e.g., van Tilbeurgh et al.,FEBS Lett., 149: 152-156 [1982]; and van Tilbeurgh and Claeyssens, FEBSLett., 187: 283-288 [1985]).

As used herein, the terms “CBH1” and “type 1 cellobiohydrolase” refer toa carbohydrate active enzyme expressed from a nucleic acid sequencecoding for a glycohydrolase (GH) Family 7 catalytic domain classifiedunder EC 3.2.1.91 or any protein, polypeptide or catalytically activefragment thereof. In some embodiments, the CBH1 is functionally linkedto a carbohydrate binding module (CBM), such as a Family 1 cellulosebinding domain.

As used herein, the terms “CBH2” and “type 2 cellobiohydrolase” refer toa carbohydrate active enzyme expressed from a nucleic sequence codingfor a glycohydrolase (GH) Family 6 catalytic domain classified under EC3.2.1.91 or any protein, polypeptide or catalytically active fragmentthereof. Type 2 cellobiohydrolases are also commonly referred to as “theCel6 family” The CBH2 may be functionally linked to a carbohydratebinding module (CBM), such as a Family 1 cellulose binding domain.

As used herein, the terms “beta-glucosidase,” “cellobiase,” and “BGL”refers to a category of cellulases (EC 3.2.1.21) that catalyze thehydrolysis of cellobiose to glucose. More particularly, the term“beta-glucosidase” refers to beta-D-glucoside glucohydrolases (E.C.3.2.1.21), that catalyze the hydrolysis of terminal non-reducingbeta-D-glucose residues with the release of beta-D-glucose.Beta-glucosidase activity can be determined using any suitable method(See e.g., Venturi et al., J. Basic Microbiol., 42: 55-66 [2002]). Insome embodiments, one unit of beta-glucosidase activity is defined as1.0 pmole of p-nitrophenol produced per minute at 40° C., at pH 5 from 1mM p-nitrophenyl-beta-D-glucopyranoside as substrate in 100 mM sodiumcitrate containing 0.01% TWEEN®-20.

As used herein, the term “glycoside hydrolase 61” and “GH61” refers to acategory of cellulases that enhance cellulose hydrolysis when used inconjunction with one or more additional cellulases. The GH61 family ofcellulases is described, for example, in the Carbohydrate Active Enzymes(CAZY) database (See e.g., Harris et al., Biochem., 49(15):3305-16[2010]).

A “hemicellulase” as used herein, refers to a polypeptide that cancatalyze hydrolysis of hemicellulose into small polysaccharides such asoligosaccharides, or monomeric saccharides. Hemicellulloses includexylan, glucuonoxylan, arabinoxylan, glucomannan and xyloglucan.Hemicellulases include, for example, the following: endoxylanases,b-xylosidases, a-L-arabinofuranosidases, a-D-glucuronidases, feruloylesterases, coumaroyl esterases, a-galactosidases, b-galactosidases,b-mannanases, and b-mannosidases. In some embodiments, the presentinvention provides enzyme mixtures that comprise one or morehemicellulases.

As used herein, the terms “xylan degrading activity” and “xylanolyticactivity” are defined as biological activities that hydrolyzexylan-containing material. The two basic approaches for measuringxylanolytic activity include: (1) measuring the total xylanolyticactivity, and (2) measuring the individual xylanolytic activities(endoxylanases, beta-xylosidases, arabinofuranosidases,alpha-glucuronidases, acetylxylan esterases, feruloyl esterases, andalpha-glucuronyl esterases) (See e.g., Biely and Puchard, J. Sci. FoodAgricul., 86: 1636-1647 [2006]; Spanikova and Biely, FEBS Lett., 580:4597-4601 [2006]; and Herrmann et al., Biochem. J., 321: 375-381[1997]). Total xylan degrading activity can be measured by determiningthe reducing sugars formed from various types of xylan, including oatspelt, beechwood, and larchwood xylans, or by photometric determinationof dyed xylan fragments released from various covalently dyed xylans. Acommonly used total xylanolytic activity assay is based on production ofreducing sugars from polymeric 4-O-methyl glucuronoxylan (See e.g.,Bailey et al., J. Biotechnol., 23(3): 257-270 [1992]). In someembodiments, xylan degrading activity is determined by measuring theincrease in hydrolysis of birchwood xylan (Sigma) by xylan-degradingenzyme(s) under the following typical conditions: 1 mL reactions, 5mg/mL substrate (total solids), 5 mg of xylanolytic protein/g ofsubstrate, 50 mM sodium acetate at pH 5, 50° C., for 24 hours, and sugaranalysis using p-hydroxybenzoic acid hydrazide (PHBAH) assay (See e.g.,Lever, Anal. Biochem., 47: 273-279 [1972]).

As used herein, the term “xylanase activity” is defined herein as a1,4-beta-D-xylan-xylohydrolase activity (E.C. 3.2.1.8) that catalyzesthe endo-hydrolysis of 1,4-beta-D-xylosidic linkages in xylans. In someembodiments, xylanase activity is determined using birchwood xylan assubstrate. One unit of xylanase activity is defined as 1.0 μmole ofreducing sugar measured in glucose equivalents produced per minuteduring the initial period of hydrolysis at 50° C., at pH 5 from 2 g ofbirchwood xylan per liter as substrate in 50 mM sodium acetatecontaining 0.01% TWEEN®-20 (See e.g., Lever, Anal. Biochem., 47: 273-279[1972]).

As used herein, the term “beta-xylosidase activity” is defined herein asa beta-D-xyloside xylohydrolase (E.C. 3.2.1.37) that catalyzes theexo-hydrolysis of short beta (1→4)-xylooligosaccharides, to removesuccessive D-xylose residues from the non-reducing termini. In someembodiments, one unit of beta-xylosidase activity is defined as 1.0μmole of p-nitrophenol produced per minute at 40° C., at pH 5 from 1 mMp-nitrophenyl-beta-D-xyloside as substrate in 100 mM sodium citratecontaining 0.01% TWEEN®-20.

As used herein, the term “acetylxylan esterase activity” is definedherein as a carboxylesterase activity (EC 3.1.1.72) that catalyses thehydrolysis of acetyl groups from polymeric xylan, acetylated xylose,acetylated glucose, alpha-napthyl acetate, and p-nitrophenyl acetate. Insome embodiments, acetylxylan esterase activity is determined using 0.5mM p-nitrophenylacetate as substrate in 50 mM sodium acetate containing0.01% TWEEN®-20, at pH 5.0. One unit of acetylxylan esterase activity isdefined as the amount of enzyme capable of releasing 1 pmole ofp-nitrophenolate anion per minute at pH 5, and 25° C.

As used herein, the term “feruloyl esterase activity” is defined hereinas a 4-hydroxy-3-methoxycinnamoyl-sugar hydrolase activity (EC 3.1.1.73)that catalyzes the hydrolysis of the 4-hydroxy-3-methoxycinnamoyl(feruloyl) group from an esterified sugar, which is usually arabinose in“natural” substrates, to produce ferulate(4-hydroxy-3-methoxycinnamate). Feruloyl esterase is also known asferulic acid esterase, hydroxycinnamoyl esterase, FAE-III, cinnamoylester hydrolase, FAEA, cinnAE, FAE-I, or FAE-II. In some embodiments,feruloyl esterase activity is determined using 0.5 mMp-nitrophenylferulate as substrate in 50 mM sodium acetate, at pH 5.0.One unit of feruloyl esterase activity equals the amount of enzymecapable of releasing 1 μmole of p-nitrophenolate anion per minute at pH5, and 25° C.

As used herein, the term “alpha-glucuronidase activity” is definedherein as an alpha-D-glucosiduronate glucuronohydrolase activity (EC3.2.1.139) that catalyzes the hydrolysis of an alpha-D-glucuronoside toD-glucuronate and an alcohol (See e.g., de Vries, J. Bacteriol., 180:243-249 [1998]). One unit of alpha-glucuronidase activity equals theamount of enzyme capable of releasing 1 pmole of glucuronic or4-O-methylglucuronic acid per minute at pH 5, 40° C.

As used herein, the term “alpha-L-arabinofuranosidase activity” isdefined as an alpha-L-arabinofuranoside arabinofuranohydrolase activity(EC 3.2.1.55) that catalyzes the hydrolysis of terminal non-reducingalpha-L-arabinofuranoside residues in alpha-L-arabinosides. The enzymeactivity acts on alpha-L-arabinofuranosides, alpha-L-arabinanscontaining (1,3)- and/or (1,5)-linkages, arabinoxylans, andarabinogalactans. Alpha-L-arabinofuranosidase is also known asarabinosidase, alpha-arabinosidase, alpha-L-arabinosidase,alpha-arabinofuranosidase, arabinofuranosidase, polysaccharidealpha-L-arabinofuranosidase, alpha-L-arabinofuranoside hydrolase,L-arabinosidase and alpha-L-arabinanase. In some embodiments,alpha-L-arabinofuranosidase activity is determined using 5 mg of mediumviscosity wheat arabinoxylan (Megazyme International Ireland, Ltd.,Wicklow, Ireland) per mL of 100 mM sodium acetate pH 5 in a total volumeof 200 μL for 30 minutes at 40° C., followed by arabinose analysis byAMINEX®. HPX-87H column chromatography (Bio-Rad Laboratories, Inc.,Hercules, Calif.).

Enzymatic lignin depolymerization can be accomplished by ligninperoxidases, manganese peroxidases, laccases, and/or cellobiosedehydrogenases (CDH), often working in synergy. These extracellularenzymes, essential for lignin degradation, are often referred to as“lignin-modifying enzymes” or “LMEs.” Three of these enzymes comprisetwo glycosylated heme-containing peroxidases: lignin peroxidase (LIP);Mn-dependent peroxidase (MNP); and, a copper-containing phenoloxidaselaccase (LCC).

As used herein, the “total available cellulose” is the amount (wt %) ofcellulose that is accessible to enzymatic hydrolysis. Total availablecellulose is typically equal to, or very close to being equal to, theamount of initial cellulose present in a hydrolysis reaction.

As used herein, the “residual cellulose” is the portion (wt %) of thetotal available cellulose in the hydrolysis mixture that remainsunhydrolyzed. Residual cellulose can be measured directly by, forexample, IR spectroscopy, or can be measured by, for example, measuringthe amount of glucose generated by concentrated acid hydrolysis of theresidual solids.

As used herein, the “total hydrolyzed cellulose” is the portion of thetotal available cellulose that is hydrolyzed in the hydrolysis mixture.For example, the total hydrolyzed cellulose can be calculated as thedifference between the “total available cellulose” and the “residualcellulose.” The “theoretical maximum glucose yield” is the maximumamount (wt %) of glucose that could be produced under a given conditionfrom the total available cellulose.

As used herein, “Gmax” refers to the maximum amount (wt %) of glucosethat could be produced from the total hydrolyzed cellulose. Gmax can becalculated, for example, by directly measuring the amount of residualcellulose remaining at the end of a reaction under a given reactionconditions, subtracting the amount of residual cellulose from the totalavailable cellulose to determine the total hydrolyzed cellulose, andthen calculating the amount of glucose that could be produced from thetotal hydrolyzed cellulose.

As used herein, “lipase” includes enzymes that hydrolyze lipids, fattyacids, and acylglycerides, including phosphoglycerides, lipoproteins,diacylglycerols, and the like. In plants, lipids are used as structuralcomponents to limit water loss and pathogen infection. These lipidsinclude waxes derived from fatty acids, as well as cutin and suberin.

As used herein, the term “C1” refers to a Chrysosporium lucknowensefungal strain described by Garg (See, Garg, Mycopathol., 30: 3-4[1966]). “Chrysosporium lucknowense” includes the strains described inU.S. Pat. Nos. 6,015,707, 5,811,381 and 6,573,086; US Pat. Pub. Nos.2007/0238155, US 2008/0194005, US 2009/0099079; International Pat. Pub.Nos., WO 2008/073914 and WO 98/15633, and include, without limitation,Chrysosporium lucknowense Garg 27K, VKM-F 3500 D (Accession No. VKMF-3500-D), C1 strain UV13-6 (Accession No. VKM F-3632 D), C1 strainNG7C-19 (Accession No. VKM F-3633 D), and C1 strain UV18-25 (VKM F-3631D), all of which have been deposited at the All-Russian Collection ofMicroorganisms of Russian Academy of Sciences (VKM), Bakhurhina St. 8,Moscow, Russia, 113184, and any derivatives thereof. Although initiallydescribed as Chrysosporium lucknowense, C1 may currently be considered astrain of Myceliophthora thermophilia. Other C1 strains includeorganisms deposited under accession numbers ATCC 44006, CBS(Centraalbureau voor Schimmelcultures) 122188, CBS 251.72, CBS 143.77,CBS 272.77, and VKM F-3500D. Exemplary C1 derivatives include modifiedorganisms in which one or more endogenous genes or sequences have beendeleted or modified and/or one or more heterologous genes or sequenceshave been introduced. Derivatives include UV18#100f Δalp1, UV18#100fΔpyr5 Δalp1, UV18#100.f Δalp1Δpep4 Δalp2, UV18#100.f Δpyr5 Δalp1 Δpep4Δalp2 and UV18#100.f Δpyr4 Δpyr5 Δalp1 Δpep4 Δalp2, as described inWO2008073914, incorporated herein by reference.

Methods for recombinant expression of proteins in fungi and otherorganisms are well known in the art, and a number of suitable expressionvectors are available or can be constructed using routine methods.Protocols for cloning and expression in fungal hosts and other organismsare well known in the art (See e.g., Zhu et al., Plasmid 6:128-33[2009]). Standard references for techniques and protocols are widelyavailable and known to those in the art (See e.g., U.S. Pat. Nos.6,015,707, 5,811,381 and 6,573,086; US Pat. Pub. Nos. US 2003/0187243,US 2007/0238155, US 2008/0194005, US 2009/0099079; WO 2008/073914 and WO98/15633, each of which is incorporated by reference herein for allpurposes).

Mutagenesis may be performed in accordance with any of the techniquesknown in the art, including random and site-specific mutagenesis.Directed evolution can be performed with any of the techniques known inthe art to screen for improved promoter variants including shuffling.Mutagenesis and directed evolution methods are well known in the art(See e.g., U.S. Pat. Nos. 5,605,793, 5,830,721, 6,132,970, 6,420,175,6,277,638, 6,365,408, 6,602,986, 7,288,375, 6,287,861, 6,297,053,6,576,467, 6,444,468, 5,811238, 6,117,679, 6,165,793, 6,180,406,6,291,242, 6,995,017, 6,395,547, 6,506,602, 6,519,065, 6,506,603,6,413,774, 6,573,098, 6,323,030, 6,344,356, 6,372,497, 7,868,138,5,834,252, 5,928,905, 6,489,146, 6,096,548, 6,387,702, 6,391,552,6,358,742, 6,482,647, 6,335,160, 6,653,072, 6,355,484, 6,03,344,6,319,713, 6,613,514, 6,455,253, 6,579,678, 6,586,182, 6,406,855,6,946,296, 7,534,564, 7,776,598, 5,837,458, 6,391,640, 6,309,883,7,105,297, 7,795,030, 6,326,204, 6,251,674, 6,716,631, 6,528,311,6,287,862, 6,335,198, 6,352,859, 6,379,964, 7,148,054, 7,629,170,7,620,500, 6,365,377, 6,358,740, 6,406,910, 6,413,745, 6,436,675,6,961,664, 7,430,477, 7,873,499, 7,702,464, 7,783,428, 7,747,391,7,747,393, 7,751,986, 6,376,246, 6,426,224, 6,423,542, 6,479,652,6,319,714, 6,521,453, 6,368,861, 7,421,347, 7,058,515, 7,024,312,7,620,502, 7,853,410, 7,957,912, 7,904,249, and all related non-UScounterparts; Ling et al., Anal. Biochem., 254(2):157-78 [1997]; Dale etal., Meth. Mol. Biol., 57:369-74 [1996]; Smith, Ann Rev. Genet.,19:423-462 [1985]; Botstein et al., Science, 229:1193-1201 [1985];Carter, Biochem. J., 237:1-7 [1986]; Kramer et al., Cell, 38:879-887[1984]; Wells et al., Gene, 34:315-323 [1985]; Minshull et al., Curr.Op. Chem. Biol., 3:284-290 [1999]; Christians et al., Nat. Biotechnol.,17:259-264 [1999]; Crameri et al., Nature, 391:288-291 [1998]; Crameri,et al., Nat. Biotechnol., 15:436-438 [1997]; Zhang et al., Proc. Nat.Acad. Sci. U.S.A., 94:4504-4509 [1997]; Crameri et al., Nat.Biotechnol., 14:315-319 [1996]; Stemmer, Nature, 370:389-391 [1994];Stemmer, Proc. Nat. Acad. Sci. USA, 91:10747-10751 [1994]; WO 95/22625;WO 97/0078; WO 97/35966; WO 98/27230; WO 00/42651; WO 01/75767; and WO2009/152336, all of which are incorporated herein by reference).

DETAILED DESCRIPTION OF THE INVENTION

In some embodiments, the improved fungal strains find use in hydrolyzingcellulosic material to glucose. In some embodiments, the improved fungalstrains find use in hydrolyzing lignocellulose material. As indicatedherein, the present invention provides improved fungal strains for theconversion of cellulose to fermentable sugars (e.g., glucose). Inparticular, the improved fungal strains provided herein are geneticallymodified to reduce the amount of endogenous protease activity secretedby the cells. The present invention also provides purified enzymesproduced by the improved fungal strains provided herein.

Genetically Modified Fungal Cells

The genetically modified fungal cells provided herein exhibit areduction in the amount of at least one endogenous protease activitythat is secreted by the cell. It will be readily appreciated that anysuitable genetic modification known in the art can be employed to reducethe secreted activity of at least one endogenous protease. For example,as described below, modifications contemplated herein includemodifications that reduce the amount of at least one protease secretedby the cell. Modifications that reduce the amount of at least oneprotease expressed by the cell are also contemplated. Additionalembodiments include modifications that reduce the transcription level ofat least one protease. Still further embodiments include the complete orpartial deletion of a gene encoding at least one protease. Otherembodiments include modifications that reduce the catalytic efficiencyof at least one protease.

In some genetically modified fungal cells provided herein, at least oneprotease activity secreted by the cell is reduced by at least about 5%,about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%,about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%,or more, relative to the level of at least one protease activitysecreted by the unmodified parental fungal cell grown or cultured underessentially the same culture conditions. In some embodiments, thegenetically modified fungal cells are Myceliophthora. In someembodiments, the genetically modified fungal cells are M. thermophilathat do not produce at least one polypeptide selected from SEQ ID NOS:3,6, 9, and/or 12. In some embodiments, the gene encoding at least oneprotease selected from the genes encoding Protease #1, Protease #2,Protease #3, and/or Protease #4 has been deleted from the Myceliophthora(e.g., M. thermophila). In some embodiments, at least one polynucleotidesequence selected from SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11 havebeen deleted from the genome of the Myceliophthora.

In some embodiments, the fungal cells of the present invention have beengenetically modified to reduce the amount of at least one endogenousprotease secreted by the cell. A reduction in the amount of secretedprotease(s) can be a complete or partial reduction of the protease(s)secreted to the extracellular milieu Reduction in the amount of secretedprotease(s) can be accomplished by reducing the amount of at least oneprotease produced by the cell and/or by reducing the ability of the cellto secrete at least one protease produced by the cell. Methods forreducing the ability of the cell to secrete a polypeptide can beperformed according to any of a variety of suitable methods known in theart (See e.g., Fass and Engels J. Biol. Chem., 271:15244-15252 [1996],which is incorporated by reference herein in its entirety). For example,the gene encoding a secreted polypeptide can be modified to delete orinactivate a secretion signal peptide. In some embodiments, the fungalcells have been genetically modified to disrupt the N-terminal secretionsignal peptide of at least one protease. In some embodiments, the amountof at least one protease secreted by the cell is reduced by at leastabout 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%,about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%,about 99%, or more, relative to the secretion of at least one proteasein an unmodified organism grown or cultured under essentially the sameculture conditions.

Furthermore, in some embodiments, the total amount of at least oneprotease activity is reduced by at least about 5%, about 10%, about 15%,about 20%, about 25%, about 30%, about 35%, about 40%, about 45%, about50%, about 55%, about 60%, about 65%, about 70%, about 75%, about 80%,about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about95%, about 96%, about 97%, about 98%, about 99%, or more, relative tothe total amount of at least one protease secreted in an unmodifiedorganism grown or cultured under essentially the same cultureconditions.

Decreased secretion of at least one protease can be determined by any ofa variety of suitable methods known in the art for detection of proteinor enzyme levels. For example, the levels of at least one protease inthe supernatant of a fungal culture can be detected using Westernblotting techniques, two-dimensional (2D) gels, or any other suitableprotein detection techniques. Similarly, secreted protease activity inthe supernatant of a fungal culture can be measured using any suitableactivity assay as known in the art.

In some embodiments, the fungal cells have been genetically modified toreduce the amount of at least one endogenous protease expressed by thecell. As used herein, expression refers to conversion of the informationencoded in a gene to the protein encoded by that gene. Thus, a reductionof the amount of an expressed protease represents a reduction in theamount of the protease that is eventually translated by the cell. Insome such embodiments, the reduction in the expression is accomplishedby reducing the amount of mRNA that is transcribed from a gene encodingprotease. In some other embodiments, the reduction in the expression isaccomplished by reducing the amount of protein that is translated from amRNA encoding protease.

The amount of protease expressed by the cell can be reduced by at leastabout 5%, about 10%, about 15%, about 20%, about 25%, about 30%, about35%, about 40%, about 45%, about 50%, about 55%, about 60%, about 65%,about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%,about 99%, or more, relative to the expression of protease in anunmodified fungal cell. In some such embodiments, the reduction in theexpression is accomplished by reducing the amount of mRNA that istranscribed from a gene encoding protease in an unmodified organismgrown or cultured under essentially the same culture conditions.

Furthermore, in some embodiments, a reduction in the expression level ofa protease results in at least about 5%, about 10%, about 15%, about20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%,about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, 85%about, 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about96%, about 97%, about 98%, or about a 99% reduction in the totalexpression level of protease activity by the fungal cell relative to anunmodified fungal cell grown or cultured under essentially the sameculture conditions.

Decreased expression of a protease can be determined by any of a varietyof methods known in the art for detection of protein or enzyme levels.For example, the levels of protease in the supernatant of a fungalculture can be detected using chromatographic methods, Western blottingtechniques or any other suitable protein detection techniques that usean antibody specific to protease. Indeed, it is not intended that thepresent invention be limited to any particular method.

Methods for reducing production of a polypeptide are well known and canbe performed using any of a variety of suitable methods known in theart. For example, the gene encoding a secreted polypeptide can bemodified to disrupt a translation initiation sequence such as aShine-Delgarno sequence or a Kozak consensus sequence. Furthermore, thegene encoding a secreted polypeptide can be modified to introduce aframeshift mutation in the transcript encoding the endogenous protease.It will also be recognized that usage of uncommon codons can result inreduced expression of a polypeptide. It will be appreciated that in someembodiments, the gene encoding the protease has at least one nonsensemutation that results in the translation of a truncated protein.

Other methods of reducing the amount of expressed polypeptide includepost-transcriptional RNA silencing methodologies such as antisense RNAand RNA interference. Antisense techniques are well-established, andinclude using a nucleotide sequence complementary to the nucleic acidsequence of the gene. More specifically, expression of at least oneprotease-encoding gene by a fungal cell may be reduced or eliminated byintroducing a nucleotide sequence complementary to the nucleic acidsequence, which may be transcribed in the cell and is capable ofhybridizing to the mRNA produced in the cell. Under conditions allowingthe complementary anti-sense nucleotide sequence to hybridize to themRNA, the amount of protein translated is thus reduced or eliminated.Methods for expressing antisense RNA are known in the art (See e.g.,Ngiam et al., Appl Environ Microbiol., 66(2):775-82 [2000]; and Zrenneret al., Planta., 190(2):247-52 [1993]), both of which are herebyincorporated by reference herein in their entirety). In someembodiments, the mRNA is destabilized though secondary structure changes(e.g., altered introns). In some embodiments, destabilization occurs dueto alterations in terminators.

Furthermore, modification, downregulation or inactivation of at leastone protease encoding gene provided herein may be obtained via RNAinterference (RNAi) techniques (See e.g., Kadotani et al. Mol. PlantMicrobe Interact., 16:769-76 [2003], which is incorporated by referenceherein in its entirety). RNA interference methodologies include doublestranded RNA (dsRNA), short hairpin RNAs (shRNAs) and small interferingRNAs (siRNAs). Potent silencing using dsRNA may be obtained using anysuitable technique (See e.g., Fire et al., Nature 391:806-11 [1998]).Silencing using shRNAs is also well-established (See e.g., Paddison etal., Genes Dev., 16:948-958 [2002]). Silencing using siRNA techniquesare also known (See e.g., Miyagishi et al., Nat. Biotechnol., 20:497-500[2002]). The content of each of the above-cited references isincorporated by reference herein in its entirety.

In some embodiments, the fungal cells of the present invention have beengenetically modified to reduce the transcription level of a geneencoding at least one endogenous protease. As used herein, transcriptionand similar terms refer to the conversion of the information encoded ina gene to an RNA transcript. Accordingly, a reduction of thetranscription level of a protease is a reduction in the amount of RNAtranscript of an RNA coding for a protease. In some embodiments, thetranscription level is reduced by at least about 5%, about 10%, about15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%,about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%,about 95%, about 96%, about 97%, about 98%, about 99%, or more, relativeto the transcription level of a protease in an unmodified organism grownor cultured under essentially the same culture conditions.

Furthermore, in some embodiments, a reduction in the transcription levelof a protease results in at least about 5%, about 10%, about 15%, about20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%,about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%,about 96%, about 97%, about 98%, or about a 99% reduction in the totalprotease secreted by the fungal cell relative to an unmodified organismgrown or cultured under essentially the same culture conditions.Decreased transcription can be determined by any of a variety of methodsknown in the art for detection of transcription levels. For example, thelevels of transcription of a particular mRNA in a fungal cell can bedetected using quantitative RT-PCR techniques or other RNA detectiontechniques that specifically detect a particular mRNA. Methods forreducing transcription level of a gene can be performed according to anysuitable method known in the art, and include partial or completedeletion of the gene, and disruption or replacement of the promoter ofthe gene such that transcription of the gene is greatly reduced or eveninhibited. For example, the promoter of the gene can be replaced with aweak promoter (See e.g., U.S. Pat. No. 6,933,133, which is incorporatedby reference herein in its entirety). Thus, where the weak promoter isoperably linked with the coding sequence of an endogenous polypeptide,transcription of that gene is greatly reduced or inhibited.

In some embodiments, the fungal cells of the present invention have beengenetically modified to at least partially delete a gene encoding theendogenous protease. Typically, this deletion reduces or eliminates thetotal amount of endogenous protease secreted by the fungal cell. In someembodiments, complete or near-complete deletion of the gene sequence iscontemplated. However, a deletion mutation need not completely removethe entire gene sequence encoding protease, in order to reduce theamount of endogenous protease secreted by the fungal cell. For example,in some embodiments, there is a partial deletion that removes one ormore nucleotides encoding an amino acid in a protease active site,encoding a secretion signal, or encoding another portion of the proteasethat plays a role in endogenous protease activity being secreted by thefungal cell.

A deletion in a gene encoding protease in accordance with theembodiments provided herein includes a deletion of one or morenucleotides in the gene encoding the protease. In some embodiments,there is a deletion of at least about 5%, about 10%, about 15%, about20%, about 25%, about 30%, about 35%, about 40%, about 45%, about 50%,about 55%, about 60%, about 65%, about 70%, about 75%, about 80%, about85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%,about 96%, about 97%, about 98%, about 99%, or about 100%, of the geneencoding the protease, wherein the amount of protease secreted by thecell is reduced.

Thus, in some embodiments, the deletion results in at least about 5%,about 10%, about 15%, about 20%, about 25%, about 30%, about 35%, about40%, about 45%, about 50%, about 55%, about 60%, about 65%, about 70%,about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about93%, about 94%, about 95%, about 96%, about 97%, about 98%, or about a99% reduction in the activity of the protease secreted by the fungalcell, relative to the activity of protease secreted by an unmodifiedorganism grown or cultured under essentially the same cultureconditions.

Furthermore, in some embodiments, the deletion results in at least about5%, about 10%, about 15%, about 20%, about 25%, about 30%, about 35%,about 40%, about 45%, about 50%, about 55%, about 60%, about 65%, about70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%,about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, orabout a 99% reduction in the total protease secreted by the fungal cellrelative to an unmodified fungal cell grown or cultured underessentially the same culture conditions.

Deletion of a protease gene can be detected and confirmed by any of avariety of methods known in the art for detection of gene deletions,including the methods provided herein. For example, gene deletion can beconfirmed using PCR amplification of the modified genomic region. Itwill be appreciated that additional suitable techniques for confirmingdeletion can be used and are well known, including Southern blottechniques, DNA sequencing of the modified genomic region, and screeningfor positive or negative markers incorporated during recombinationevents. Indeed, any suitable method known in the art finds use in thepresent invention.

Methods for complete and/or partial deletion of a gene are well-knownand the genetically modified fungal cells described herein can begenerated using any of a variety of deletion methods known in the artthat result in a reduction in the amount of at least one endogenousprotease secreted by the cells. Such methods may advantageously includestandard gene disruption using homologous flanking markers (See e.g.,Rothstein, Meth. Enzymol., 101:202-211 [1983], incorporated herein byreference in its entirety). Additional techniques for gene deletioninclude PCR-based methods for standard deletion (See e.g., Davidson etal., Microbiol., 148:2607-2615 [2002], incorporated herein by referencein its entirety).

Additional gene deletion techniques include, but are not limited to“positive-negative” cassettes (See e.g., Chang et al., Proc. Natl. Acad.Sci. USA 84:4959-4963 [1987]), cre/lox based deletion (See e.g., Floreaet al., Fung. Genet. Biol., 46:721-730 [2009]), biolistic transformationto increase homologous recombination, and Agrobacterium-mediated genedisruption.

Methods to introduce DNA or RNA into fungal cells are known to those ofskill in the art and include, but are not limited to PEG-mediatedtransformation of protoplasts, electroporation, biolistic transformation(See e.g., Davidson et al., Fung. Genet. Biol., 29:38-48 [2000]), andAgrobacterium-mediated transformation (See e.g., Wang et al., Curr.Genet., 56:297-307 [2010]).

Further methods for complete or partial gene deletion include disruptionof the gene. Such gene disruption techniques are known to those of skillin the art, including, but not limited to insertional mutagenesis, theuse of transposons, and marked integration. However, it will beappreciated that any suitable technique that provides for disruption ofthe coding sequence or any other functional aspect of a gene finds usein generating the genetically modified fungal cells provided herein.Methods of insertional mutagenesis can be performed according to anysuitable method known in the art (See e.g., Combier et al., FEMSMicrobiol Lett., 220:141-8 [2003], which is incorporated by referenceherein in its entirety). In addition, Agrobacterium-mediated insertionalmutagenesis can be used to insert a sequence that disrupts the functionof the encoded gene, such as disruption of the coding sequence or anyother functional aspect of the gene.

Transposon mutagenesis methodologies provide another means for genedisruption. Transposon mutagenesis is well known in the art, and can beperformed using in vivo techniques (See e.g., Firon et al., Eukaryot.Cell 2:247-55 [2003]); or by the use of in vitro techniques (See e.g.,Adachi et al., Curr. Genet., 42:123-7 [2002]); both of these referencesare incorporated by reference in their entireties. Thus, targeted genedisruption using transposon mutagenesis can be used to insert a sequencethat disrupts the function of the encoded gene, such as disruption ofthe coding sequence or any other functional aspect of the gene.

Restriction enzyme-mediated integration (REMI) is another methodologyfor gene disruption, and is well known in the art (See e.g., Thon etal., Mol. Plant Microbe Interact., 13:1356-65 [2000], which isincorporated by reference herein in its entirety). REMI generatesinsertions into genomic restriction sites in an apparently randommanner, some of which cause mutations. Thus, insertional mutants thatdemonstrate a disruption in the gene encoding the endogenous proteasecan be selected and utilized as provided herein.

In some other embodiments, the fungal cell has been genetically modifiedto reduce the catalytic efficiency of the protease. A reduction incatalytic efficiency refers to a reduction in the activity of protease,relative to unmodified protease, as measured using standard techniquesknown in the art. Thus, a genetic modification that reduces catalyticefficiency can result in, for example, a translated protein product thathas a reduction in enzymatic activity.

A reduction in catalytic efficiency is a reduction of protease activityof about 5%, about 10%, about 15%, about 20%, about 25%, about 30%,about 35%, about 40%, about 45%, about 50%, about 55%, about 60%, about65%, about 70%, about 75%, about 80%, about 85%, about 90%, about 91%,about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about98%, about 99%, or more, relative to unmodified protease, as measuredusing standard techniques.

In some further embodiments, the genetic modification results in areduction of protease activity of at least about 5%, about 10%, about15%, about 20%, about 25%, about 30%, about 35%, about 40%, about 45%,about 50%, about 55%, about 60%, about 65%, about 70%, about 75%, about80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%,about 95%, about 96%, about 97%, about 98%, or about 99% in the totalprotease activity secreted by the fungal cell, as compared to unmodifiedprotease, as measured using standard techniques.

Methods for reducing catalytic efficiency of proteases are well known,and as such, any of a variety of suitable methods known in the art forreducing catalytic efficiency find use in genetically modifying thefungal cells provided herein. Thus, for example, the fungal cell can begenetically modified to inactivate one or more residues in an activesite of the protease. For example, one or more residues can be modifiedto decrease substrate binding, and/or one or more residues can bemodified to decrease the catalytic activity of the protease. Similarly,it will be apparent that mutation of residues outside an active site canresult in allosteric change in the shape or activity of the protease,such that the catalytic efficient of the enzyme is reduced. In someembodiments, other domains are targeted for at least one mutation whichresults in a reduced catalytic efficiency of at least one endogenousprotease.

As provided herein, a fungal cell that has been genetically modified toreduce the activity of at least one protease typically has reducedsecreted activity of an endogenous protease. Accordingly, one or moreprotease enzymes from each of the fungal species described herein can betargeted for genetic modification. In some embodiments, the protease isfrom a fungal species in the family Chaetomiaceae. In some embodiments,the protease is from a fungal species selected from Sporotrichumcellulophilum, Thielavia heterothallica, Corynascus heterothallicus,Thielavia terrestris, Chaetomium globosum, and Myceliophthorathermophila.

Certain amino acid sequences encoding protease are provided herein. Forexample, in one embodiment, the nucleotide sequences (gDNA and cDNA)encoding one Myceliophthora thermophila protease (“Protease #1) are setforth herein as SEQ ID NOS:1 and 2, and the encoded amino acid sequenceis set forth as SEQ ID NO:3. In another embodiment, nucleotide sequences(gDNA and cDNA) encoding another Myceliophthora thermophila protease(“Protease #2) are set forth herein as SEQ ID NOS:4 and 5, and theencoded amino acid sequence is set forth as SEQ ID NO:6. In yet anotherembodiment, the nucleotide sequences (gDNA and cDNA) of anotherMyceliophthora thermophila protease (“Protease #3) are set forth hereinas SEQ ID NOS:7 and 8, and the encoded amino acid sequence is set forthas SEQ ID NO:9. In yet another embodiment, the nucleotide sequences(gDNA and cDNA) of another Myceliophthora thermophila protease(“Protease #4) are set forth herein as SEQ ID NOS:10 and 11, and theencoded amino acid sequence is set forth as SEQ ID NO:12.

In some embodiments, the protease is encoded by a nucleic acid sequencethat is at least about 60%, about 61%, about 62%, about 63%, about 64%,about 65%, about 66%, about 67%, about 68%, about 69%, about 70%, about71%, about 72%, about 73%, about 74%, about 75%, about 76%, about 77%,about 78%, about 79%, about 80%, about 81%, about 82%, about 83%, about84%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%,about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about97%, about 98%, about 99%, or about 100% identical to SEQ ID NOS:1, 2,4, 5, 7, 8, 10, and/or 11. In some embodiments, the protease is encodedby a nucleic acid sequence that is at least about 60%, about 61%, about62%, about 63%, about 64%, about 65%, about 66%, about 67%, about 68%,about 69%, about 70%, about 71%, about 72%, about 73%, about 74%, about75%, about 76%, about 77%, about 78%, about 79%, about 80%, about 81%,about 82%, about 83%, about 84%, about 85%, about 86%, about 87%, about88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%,about 95%, about 96%, about 97%, about 98%, about 99%, or about 100%identical to a nucleic acid sequence encoding the amino acid sequenceset forth as SEQ ID NOS:3, 6, 9, and/or 12. In some embodiments, theprotease is encoded by a nucleic acid sequence that can selectivelyhybridize to SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, and/or 11, undermoderately stringent or stringent conditions, as described hereinabove.In some embodiments, the protease is encoded by a nucleic acid sequencethat can selectively hybridize under moderately stringent or stringentconditions to a nucleic acid sequence that encodes SEQ ID NOS:3, 6, 9,and/or 12. In some embodiments, the protease comprises an amino acidsequence with at least about 50%, about 51%, about 52%, about 53%, about54%, about 55%, about 56%, about 57%, about 58%, about 59%, about 60%,about 61%, about 62%, about 63%, about 64%, about 65%, about 66%, about67%, about 68%, about 69%, about 70%, about 71%, about 72%, about 73%,about 74%, about 75%, about 76%, about 77%, about 78%, about 79%, about80%, about 81%, about 82%, about 83%, about 84%, about 85% about 86%,about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%,or about 100% similarity to the amino acid sequence set forth as SEQ IDNOS:3, 6, 9, and/or 12. Protease sequences can be identified by any of avariety of methods known in the art. For example, a sequence alignmentcan be conducted against a database, for example against the NCBIdatabase, and sequences with the lowest HMM E-value can be selected.

In some embodiments, the fungal cells of the present invention have beengenetically modified to reduce the amount of protease activity from twoor more endogenous protease enzymes secreted by the cell. In someembodiments, a first of the two or more proteases comprises an aminoacid sequence that is at least about 60%, about 61%, about 62%, about63%, about 64%, about 65%, about 66%, about 67%, about 68%, about 69%,about 70%, about 71%, about 72%, about 73%, about 74%, about 75%, about76%, about 77%, about 78%, about 79%, about 80%, about 81%, about 82%,about 83%, about 84%, about 85%, about 86%, about 87%, about 88%, about89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%,about 96%, about 97%, about 98%, or about 99% identical to SEQ ID NO:3,6, 9, or 12, and a second of the two or more protease enzymes comprisesan amino acid sequence that is at least about 60%, about 61%, about 62%,about 63%, about 64%, about 65%, about 66%, about 67%, about 68%, about69%, about 70%, about 71%, about 72%, about 73%, about 74%, about 75%,about 76%, about 77%, about 78%, about 79%, about 80%, about 81%, about82%, about 83%, about 84%, about 85%, about 86%, about 87%, about 88%,about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about95%, about 96%, about 97%, about 98%, or about 99% identical to SEQ IDNO:3, 6, 9, or 12.

As indicated herein, the present invention provides fungal cells fromthe family Chaetomiaceae that have been genetically modified to reducethe amount of endogenous protease activity that is secreted by the cell,where the fungal cell is capable of secreting a cellulase-containingenzyme mixture. The Chaetomiaceae are a family of fungi in theAscomycota, class Sordariomycetes. The family Chaetomiaceae includes thegenera Achaetomium, Aporothielavia, Chaetomidium, Chaetomium,Corylomyces, Corynascus, Farrowia, Thielavia, Zopfiella, andMyceliophthora. In some embodiments, the genetically modified fungalcell provided herein is a Chaetomiaceae family member selected fromMyceliophthora, Thielavia, Corynascus, and Chaetomium.

In some embodiments, the genetically modified fungal cell is an anamorphor teleomorph of a Chaetomiaceae family member selected fromMyceliophthora, Thielavia, Corynascus, and Chaetomium. In someembodiments, the genetically modified fungal cell is selected fromSporotrichum, Chrysosporium, Paecilomyces, Talaromyces and Acremonium.It is also contemplated that the genetically modified fungal cell canalso be selected from the genera Ctenomyces, Thermoascus, andScytalidium, including anamorphs and teleomorphs of fungal cells ofthese genera. In some embodiments, the genetically modified fungal cellis selected from strains of Sporotrichum cellulophilum, Thielaviaheterothallica, Corynascus heterothallicus, Thielavia terrestris, andMyceliophthora thermophila, including anamorphs and teleomorphs thereof.It is not intended that the present invention be limited to anyparticular genus within the Chaetomiaceae family. In some furtherembodiments, the genetically modified fungal cell is a thermophilicspecies of Acremonium, Arthroderma, Corynascus, Thielavia,Myceliophthora, Thermoascus, Chromocleista, Byssochlamys, Sporotrichum,Chaetomium, Chrysosporium, Scytalidium, Ctenomyces, Paecilomyces, orTalaromyces. It will be understood that for all of the aforementionedspecies, the genetically modified fungal cell presented hereinencompasses both the perfect and imperfect states, and other taxonomicequivalents (e.g., anamorphs), regardless of the species name by whichthey are known (See e.g., Cannon, Mycopathol., 111:75-83 [1990];Moustafa et al., Persoonia 14:173-175 [1990]; Upadhyay et al.,Mycopathol., 87:71-80 [1984]; Guarro et al., Mycotaxon 23: 419-427[1985]; Awao et al., Mycotaxon 16:436-440 [1983]; and von Klopotek,Arch. Microbiol., 98:365-369 [1974]). Those skilled in the art willreadily recognize the identity of appropriate equivalents. Accordingly,it will be understood that, unless otherwise stated, the use of aparticular species designation in the present disclosure also refers tospecies that are related by anamorphic or teleomorphic relationship.

In some embodiments provided herein, the fungal cell is furthergenetically modified to increase its production of one or moresaccharide hydrolyzing enzymes. For example, in some embodiments, thefungal cell overexpresses a homologous or heterologous gene encoding asaccharide hydrolysis enzyme such as beta-glucosidase. In someembodiments, the one or more saccharide hydrolysis enzyme is a cellulaseenzyme described herein. For example, in some embodiments, the enzyme isany one of a variety of endoglucanases, cellobiohydrolases,beta-glucosidases, endoxylanases, beta-xylosidases,arabinofuranosidases, alpha-glucuronidases, acetylxylan esterases,feruloyl esterases, and alpha-glucuronyl esterases, and/or any otherenzyme involved in saccharide hydrolysis. In some embodiments, thefungal cell is genetically modified to increase expression ofbeta-glucosidase. Thus, in some embodiments, the fungal cell comprises apolynucleotide sequence for increased expression ofbeta-glucosidase-encoding polynucleotide. In some embodiments, thefungal cell is further genetically modified to delete polynucleotidesencoding one or more endogenous protease enzymes.

In some embodiments, the saccharide hydrolyzing enzyme is endogenous tothe fungal cell, while in other embodiments, the saccharide hydrolyzingenzyme is exogenous to the fungal cell. In some additional embodiments,the enzyme mixture further comprises a saccharide hydrolyzing enzymethat is heterologous to the fungal cell. Still further, in someembodiments, the methods for generating glucose comprise contactingcellulose with an enzyme mixture that comprises a saccharide hydrolyzingenzyme that is heterologous to the fungal cell.

In some embodiments, a fungal cell is genetically modified to increasethe expression of a saccharide hydrolysis enzyme using any of a varietyof suitable methods known to those of skill in the art. In someembodiments, the hydrolyzing enzyme-encoding polynucleotide sequence isadapted for increased expression in a host fungal cell. As used herein,a polynucleotide sequence that has been adapted for expression is apolynucleotide sequence that has been inserted into an expression vectoror otherwise modified to contain regulatory elements necessary forexpression of the polynucleotide in the host cell, positioned in such amanner as to permit expression of the polynucleotide in the host cell.Such regulatory elements required for expression include promotersequences, transcription initiation sequences and, optionally, enhancersequences. For example, in some embodiments, a polynucleotide sequenceis inserted into a plasmid vector adapted for expression in the fungalhost cell.

In some embodiments, the genetically modified fungal cells providedherein are cellulase-producing fungal cells. In some embodiments, thecellulase-producing fungal cells express and secrete a mixture ofcellulose hydrolyzing enzymes. In some embodiments, the geneticallymodified fungal cells provided herein are fungal cells from the familyChaetomiaceae that secrete two or more cellulose hydrolyzing enzymes(e.g., endoglucanase, cellobiohydrolase, and/or beta-glucosidase). Insome additional embodiments, the cellulase-producing fungal cellsproduce two or more of these enzymes, in any combination. Additionally,in some embodiments, the genetically modified fungal cell is derivedfrom a lignocellulose-competent parental fungal cell.

The present invention also provides a fungal culture in a vesselcomprising a genetically modified fungal cell as described hereinabove.In some embodiments, the vessel comprises a liquid medium, such asfermentation medium. For example, the vessel can be a flask, bioprocessreactor, or any suitable container. In some embodiments, the vesselcomprises a solid growth medium. For example, the solid medium can be anagar medium such as potato dextrose agar, carboxymethylcellulose,cornmeal agar, and any other suitable medium. In some embodiments, thefungal cell described hereinabove is an isolated fungal cell.

Enzyme Mixtures

Also provided herein are enzyme mixtures that comprise at least one ormore cellulose hydrolyzing enzymes expressed by a fungal cell that hasbeen genetically modified to reduce the amount of endogenous proteaseactivity secreted by the cell, as described herein. Cellulase enzymesare produced by a wide variety of microorganisms. Cellulases (andhemicellulases) from filamentous fungi and some bacteria are widelyexploited for many industrial applications that involve processing ofnatural fibers to sugars. It is contemplated that mixtures of anyenzymes set forth herein will find use in the present invention.

As a further guide to the reader, yet without implying any limitation inthe practice of the present invention, exemplary mixtures of componentsthat may be used as catalysts in a saccharification reaction to generatefermentable sugars from a cellulosic substrate are provided herein.Concentrations are given in wt/vol of each component in the finalreaction volume with the cellulose substrate. Also provided arepercentages of each component (wt/wt) in relation to the total mass ofthe components that are listed for addition into each mixture (the“total protein”). This may be a mixture of purified enzymes and/orenzymes in a culture supernatant.

By way of example, the invention embodies mixtures that comprise atleast four, at least five, or all six of the following components. Insome embodiments, cellobiohydrolase 1 (CBH1) finds use; in someembodiments CBH1 is present at a concentration of about 0.14 to about0.23 g/L (about 15% to about 25% of total protein). Exemplary CBH1enzymes include, but are not limited to T. emersonii CBH1 (wild-type)(e.g., SEQ ID NO:137), wild-type M. thermophila CBH1a (e.g., SEQ IDNO:140), and the variants CBH1a-983 (e.g., SEQ ID NO:146) and CBH1a-145(e.g., SEQ ID NO:143). In some embodiments, cellobiohydrolase 2 (CBH2)finds use; in some embodiments, CBH2 is present at a concentration ofabout 0.14 to about 0.23 g/L (about 15% to about 25% of total protein).Exemplary CBH2 enzymes include, but are not limited to wild-type CBH2bfrom M. thermophila (wild-type) (e.g., SEQ ID NO:149), and/or variantsCHB2b var. 196 (e.g., SEQ ID NO: 152), CBH2b var. 287 (e.g., SEQ IDNO:155), and CBH2b var. 962 (e.g, SEQ ID NO:158). In some embodiments,endoglucanase 2 (EG2) finds use; in some embodiments, EG2 is present ata concentration of 0 to about 0.05 g/L (0 to about 5% of total protein).Exemplary EGs include, but are not limited to wild-type M. thermophilaEG2 (e.g., SEQ ID NO:125). In some further embodiments, endoglucanase 1(EG1) finds use; in some embodiments, EG1 is present at a concentrationof about 0.05 to about 0.14 g/L (about 5% to about 15% of totalprotein). Exemplary EG1s include, but are not limited to wild-type M.thermophila EG1b (e.g., SEQ ID NO:122). In some embodiments,beta-glucosidase (BGL) finds use in the present invention; in someembodiments, BGL is present at a concentration of about 0.05 to about0.09 g/L (about 5% to about 10% of total protein). Exemplarybeta-glucosidases include, but are not limited to wild-type M.thermophila BGL1 (e.g., SEQ ID NO:128), as well as variant BGL-900(e.g., SEQ ID NO:134), and variant BGL-883 (e.g., SEQ ID NO:131). Insome further embodiments, GH61 protein and/or protein variants find use;in some embodiments, GH61 enzymes are present at a concentration ofabout 0.23 to about 0.33 g/L (about 25% to about 35% of total protein).Exemplary GH61s include, but are not limited to wild-type M. thermophilaGH61a (e.g., SEQ ID NO:14), GH61a Variant 1 (e.g., SEQ ID NO:17), GH61aVariant 5 (e.g., SEQ ID NO:20), and/or GH61a Variant 9 (e.g., SEQ IDNO:23), and/or any other GH61a variant proteins, as well as any of theother GH61 enzymes (e.g., GH61b, GH61c, GH61d, GH61e, GH61f, GH61g,GH61h, GH161i, GH61j, GH61k, GH61l, GH61m, GH61n, GH61o, GH61p, GH61q,GH61r, GH61s, GH61t, GH61u, GH61v, GH61w, GH61x, and/or GH61y) asprovided herein (e.g., polynucleotide and polypeptide sequencesincluding, but not limited to SEQ ID NOS:25-120).

In some embodiments, one, two or more than two enzymes are present inthe mixtures of the present invention. In some embodiments, GH61p ispresent at a concentration of about 0.05 to about 0.14 g/L (e.g, about1% to about 15% of total protein). Exemplary M. thermophila GH61penzymes include, but are not limited to those set forth in SEQ ID NOS:82and 85. In some embodiments, GH61f is present at a concentration ofabout 0.05 to about 0.14 g/L (about 1% to about 15% of total protein).An exemplary M. thermophila GH61f is set forth in SEQ ID NO:41. In someadditional embodiments, at least one additional GH61 enzyme providedherein (e.g., GH61b, GH61c, GH61d, GH61e, GH61g, GH61h, GH61i, GH61j,GH61k, GH61l, GH61m, GH61n, GH61n, GH61o, GH61q, GH61r, GH61s, GH61t,GH61u, GH61v, GH61w, GH61x, and/or GH61y, finds use at an appropriateconcentration (e.g., about 0.05 to about 0.14 g/L [about 1% to about 15%of total protein]).

In some embodiments, at least one xylanase at a concentration of about0.05 to about 0.14 g/L (about 1% to about 15% of total protein) findsuse in the present invention. Exemplary xylanases include but are notlimited to the M. thermophila xylanase-3 (SEQ ID NO:161), xylanase-2(SEQ ID NO:164), xylanase-1 (SEQ ID NO:167), xylanase-6 (SEQ ID NO:170),and xylanase-5 (SEQ ID NO:173).

In some additional embodiments, at least one beta-xylosidase at aconcentration of about 0.05 to about 0.14 g/L (e.g., about 1% to about15% of total protein) finds use in the present invention. Exemplarybeta-xylosidases include but are not limited to the M. thermophilabeta-xylosidase (SEQ ID NO:176).

In still some additional embodiments, at least one acetyl xylan esteraseat a concentration of about 0.05 to about 0.14 g/L (e.g., about 1% toabout 15% of total protein) finds use in the present invention.Exemplary acetylxylan esterases include but are not limited to the M.thermophila acetylxylan esterase (SEQ ID NO:179).

In some further additional embodiments, at least one ferulic acidesterase at a concentration of about 0.05 to about 0.14 g/L (e.g., about1% to about 15% of total protein) finds use in the present invention.Exemplary ferulic esterases include but are not limited to the M.thermophila ferulic acid esterase (SEQ ID NO:182).

In some embodiments, the enzyme mixtures comprise at least one GH61variant protein as provided herein and at least one cellulase, includingbut not limited to any of the enzymes described herein. In someembodiments, the enzyme mixtures comprise at least one GH61 variantprotein and at least one wild-type GH61 protein. In some embodiments,the enzyme mixtures comprise at least one GH61 variant protein and atleast one non-cellulase enzyme. Indeed, it is intended that anycombination of enzymes will find use in the enzyme compositionscomprising at least one GH61 variant of the present invention.

The concentrations listed above are appropriate for a final reactionvolume with the biomass substrate in which all of the components listed(the “total protein”) is about 0.75 g/L, and the amount of glucan isabout 93 g/L, subject to routine optimization. The user may empiricallyadjust the amount of each component and total protein for cellulosicsubstrates that have different characteristics and/or are processed at adifferent concentration. Any one or more of the components may besupplemented or substituted with variants with common structural andfunctional characteristics, as described below.

Without implying any limitation, the following mixtures further describesome embodiments of the present invention.

Some mixtures comprise CBH1a within a range of about 15% to about 30%total protein, typically about 20% to about 25%; CBH2 within a range ofabout 15% to about 30%, typically about 17% to about 22%; EG2 within arange of about 1% to about 10%, typically about 2% to about 5%; BGL1within a range of about 5% to about 15%, typically about 8% to about12%; GH61a within a range of about 10% to about 40%, typically about 20%to about 30%; EG1b within a range of about 5% to about 25%, typicallyabout 10% to about 18%; and GH61f within a range of 0% to about 30%;typically about 5% to about 20%.

In some mixtures, exemplary BGL1s include the BGL1 variant 900 (SEQ IDNO:134) and/or variant 883 (SEQ ID NO:131). In some embodiments, otherenzymes are M. thermophila wild-type: CBH1a (SEQ ID NO:140), CBH2b (SEQID NO:149), EG2 (SEQ ID NO:125), GH61a (SEQ ID NO:14), EG1b (SEQ IDNO:122) and GH61f (SEQ ID NO:41). Any one or more of the components maybe supplemented or substituted with variants having common structuraland functional characteristics with the component being substituted orsupplemented, as described below. In a saccharification reaction, theamount of glucan is generally about 50 to about 300 g/L, typically about75 to about 150 g/L. The total protein is about 0.1 to about 10 g/L,typically about 0.5 to about 2 g/L, or about 0.75 g/L.

Some mixtures comprise CBH1 within a range of about 10% to about 30%,typically about 15% to about 25%; CBH2b within a range of about 10% toabout 25%, typically about 15% to about 20%; EG2 within a range of about1% to about 10%, typically about 2% to about 5%; EG1b within a range ofabout 2% to about 25%, typically about 6% to about 14%; GH61a within arange of about 5% to about 50%, typically about 10% to about 35%; andBGL1 within a range of about 2% to about 15%, typically about 5% toabout 12%. Also included is copper sulfate to generate a finalconcentration of Cu⁺⁺ of about 4 μM to about 200 μM, typically about 25μM to about 60 μM. However, it is not intended that the added copper belimited to any particular concentration, as any suitable concentrationfinds use in the present invention and will be determined based on thereaction conditions.

In an additional mixture, an exemplary CBH1 is wild-type CBH1 from T.emersonii (SEQ ID NO:137), as well as wild-type M. thermophila CBH1a(SEQ ID NO:140), Variant 983 (SEQ ID NO:146), and Variant 145 (SEQ IDNO:143); exemplary CBH2 enzymes include the wild-type (SEQ ID NO:149),Variant 962 (SEQ ID NO:158), Variant 196 (SEQ ID NO:152), and Variant287 (SEQ ID NO:155); an exemplary EG2 is the wild-type M. thermophila(SEQ ID NO:125); an exemplary EG1b is the wild-type (SEQ ID NO: 122);exemplary GH61a enzymes include wild-type M. thermophila (SEQ ID NO:14),Variant 1 (SEQ ID NO:17), Variant 5 (SEQ ID NO:20), and Variant 9 (SEQID NO:23); and exemplary BGLs include wild-type M. thermophila BGL (SEQID NO:128), Variant 883 (SEQ ID NO:131), and Variant 900 (SEQ IDNO:134). Any one or more of the components may be supplemented orsubstituted with other variants having common structural and functionalcharacteristics with the component being substituted or supplemented, asdescribed below. In a saccharification reaction, the amount of glucan isgenerally about 50 to about 300 g/L, typically about 75 to about 150g/L. The total protein is about 0.1 to about 10 g/L, typically about 0.5to about 2 g/L, or about 0.75 g/L.

Any or all of the components listed in the mixtures referred to abovemay be supplemented or substituted with variant proteins that arestructurally and functionally related, as described herein.

In some embodiments, the CBH1 cellobiohydrolase used in mixtures of thepresent invention comprises at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or 100%identical to either SEQ ID NO:140 (M. thermophila), SEQ ID NO:137 (T.emersonii), or a fragment of either SEQ ID NO:140 or SEQ ID NO:137having cellobiohydrolase activity, as well as variants of M. thermophilaCBH1a (e.g., SEQ ID NO:143 and/or SEQ ID NO:146), and/or variantfragment(s) having cellobiohydrolase activity. Exemplary CBH1 enzymesinclude, but are not limited to those described in US Pat. Appln. Publn.No. 2012/0003703 A1, which is hereby incorporated herein by reference inits entirety for all purposes.

In some embodiments, the CBH2b cellobiohydrolase used in the mixtures ofthe present invention comprises at least about 80%, at least about 85%,at least about 90%, at least about 91%, at least about 92%, at leastabout 93%, at least about 94%, at least about 95%, at least about 96%,at least about 97%, at least about 98%, at least about 99%, or 100%identical to SEQ ID NO:149 and/or a fragment of SEQ ID NO:149, as wellas at least one variant M. thermophila CBH2b enzyme (e.g., SEQ IDNO:152, 155, and/or 158) and/or variant fragment(s) havingcellobiohydrolase activity. Exemplary CBH2b enzymes are described inU.S. Pat. Appln. Ser. No. 61/479,800, and Ser. No. 13/459,038, both ofwhich are hereby incorporated herein by reference in their entirety forall purposes.

In some embodiments, the EG2 endoglucanase used in the mixtures of thepresent invention comprises at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or 100%identical to SEQ ID NO:125 and/or a fragment of SEQ ID NO:125 havingendoglucanase activity. Exemplary EG2 enzymes are described in U.S.patent application Ser. No. 13/332,114, and WO 2012/088159, both ofwhich are hereby incorporated herein by reference in their entirety forall purposes.

In some embodiments, the EG1b endoglucanase used in the mixtures of thepresent invention comprises at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or 100%identical to SEQ ID NO:122 and/or a fragment of SEQ ID NO:122 havingendoglucanase activity.

In some embodiments, the BGL1 beta-glucosidase used the mixtures of thepresent invention comprises at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or 100%identical to SEQ ID NOS:128, 131, and/or 134, or a fragment of SEQ IDNOS:128, 131, and/or 134 having beta-glucosidase activity. ExemplaryBGL1 enzymes include, but are not limited to those described in US Pat.Appln. Publ. No. 2011/0129881, WO 2011/041594, and US Pat. Appln. Publ.No. 2011/0124058 A1, all of which are hereby incorporated herein byreference in their entireties for all purposes.

In some embodiments, the GH61f protein used in the mixtures of thepresent invention comprises at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or 100%identical to SEQ ID NO:41, and/or a fragment of SEQ ID NO:41 having GH61activity, assayed as described elsewhere in this disclosure.

In some embodiments, the GH61p protein used in the mixtures of thepresent invention comprises at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or 100%identical to SEQ ID NO:82, SEQ ID NO:85, and/or a fragment of suchsequence having GH61p activity.

In some embodiments, the xylanase used in the mixtures of the presentinvention comprises at least about 80%, at least about 85%, at leastabout 90%, at least about 91%, at least about 92%, at least about 93%,at least about 94%, at least about 95%, at least about 96%, at leastabout 97%, at least about 98%, at least about 99%, or 100% identical toSEQ ID NO:161, SEQ ID NO:164, SEQ ID NO:167, SEQ ID NO:170, and/or SEQID NO:173, and/or a fragment of such sequence having xylanase activity.

In some embodiments, the xylosidase used in the mixtures of the presentinvention comprises at least about 80%, at least about 85%, at leastabout 90%, at least about 91%, at least about 92%, at least about 93%,at least about 94%, at least about 95%, at least about 96%, at leastabout 97%, at least about 98%, at least about 99%, or 100% identical toSEQ ID NO:176 and/or a fragment of such sequence having xylosidaseactivity.

In still some additional embodiments, at least one acetyl xylan esteraseat a concentration of about 0.05 to about 0.14 g/L (e.g., about 1% toabout 15% of total protein) finds use in the present invention.Exemplary acetylxylan esterases include but are not limited to the M.thermophila acetylxylan esterase (SEQ ID NO:179).

In some further additional embodiments, at least one ferulic acidesterase at a concentration of about 0.05 to about 0.14 g/L (e.g., about1% to about 15% of total protein) finds use in the present invention.Exemplary ferulic esterases include but are not limited to the M.thermophila ferulic acid esterase (SEQ ID NO:182).

In some embodiments, the enzyme mixture comprises at least one or morecellulose hydrolyzing enzymes expressed by a fungal cell that has beengenetically modified to reduce the amount of endogenous proteaseactivity that is secreted by the cell, as described herein. In someembodiments, the fungal cell is a lignocellulose-utilizing cell from thefamily Chaetomiaceae. In some embodiments, the genetically modifiedfungal cell provided herein is a Chaetomiaceae family member selectedfrom Myceliophthora, Thielavia, Corynascus, or Chaetomium. In some otherembodiments, the genetically modified fungal cell can also be ananamorph or teleomorph of a Chaetomiaceae family member selected fromMyceliophthora, Thielavia, Corynascus, or Chaetomium. In addition, thegenetically modified fungal cell can also be selected from Sporotrichumor Acremonium or Talaromyces. It is also contemplated that thegenetically modified fungal cell be selected from Ctenomyces,Thermoascus, and Scytalidium, including anamorphs and teleomorphs offungal cells from those genera. In some embodiments, the fungal cell isa species selected from Sporotrichum cellulophilum, Thielaviaheterothallica, Corynascus heterothallicus, Thielavia terrestris,Chaetomium globosum, Talaromyces stipitatus, Talaromyces emersonii, andMyceliophthora thermophila, including anamorphs and teleomorphs thereof.

In some embodiments, at least one cellulase in the mixtures of thepresent invention is produced by any suitable organism. In someembodiments, at least one cellulase in the mixtures is produced byAcidothermus cellulolyticus, Thermobifida fusca, Humicola grisea,Myceliophthora thermophila, Chaetomium thermophilum, Acremonium sp.,Thielavia sp, Trichoderma reesei, Aspergillus sp., or Chrysosporium sp.,and/or at least one enzyme produced in a heterologous organism. Indeed,it is not intended that the present invention be limited to enzymesproduced by protease-deficient Myceliophthora. The present inventionencompasses enzyme mixtures comprising enzymes produced byMyceliophthora in combination with at least one cellulase and/or otherenzymes produced by any other suitable organisms, wherein at least onecellulase and/or enzyme is either homologous or heterologous to the cellproducing the cellulase(s) and/or other enzyme(s). In some embodiments,the enzyme mixtures comprise bacterial, as well as fungal enzymes. Insome embodiments, bacterial enzymes produced by and/or from organismssuch as Bacillus find use. However, it is not intended that the presentinvention be limited to any particular bacterial organism and/or anyparticular bacterial enzyme, as any suitable organisms and/or enzymesfind use in the present invention. In some embodiments, cellulaseenzymes of the cellulase mixture work together, resulting indecrystallization and hydrolysis of the cellulose from a biomasssubstrate to yield fermentable sugars, such as but not limited toglucose.

In some embodiments, the enzyme mixture is contained in a vesselcomprising a genetically modified fungal cell as described herein. Insome embodiments, the vessel comprises a liquid medium. In someembodiments, the vessel is a flask, bioprocess reactor, or any othersuitable container. In some embodiments, the enzyme mixture is in aliquid volume. In some embodiments, the liquid volume can be greaterthan about 0.01 mL, about 0.1 mL, about 1 mL, about 10 mL, about 100 mL,about 1000 mL, or greater than about 10 L, about 50 L, about 100 L,about 200 L, about 300 L, about 400 L, about 500 L, about 600 L, about700 L, about 800 L, about 900 L, about 1000 L, about 10,000 L, about50,000 L, about 100,000 L, about 250,000 L, about 500,000 L or greaterthan about 1,000,000 L.

In addition to the enzymes described above, other enzymes such aslaccases find use in the mixtures of the present invention. Laccases arecopper containing oxidase enzymes that are found in many plants, fungiand microorganisms. Laccases are enzymatically active on phenols andsimilar molecules and perform a one electron oxidation. Laccases can bepolymeric and the enzymatically active form can be a dimer or trimer.

Mn-dependent peroxidases also find use in the mixtures of the presentinvention. The enzymatic activity of Mn-dependent peroxidase (MnP) in isdependent on Mn²⁺. Without being bound by theory, it has been suggestedthat the main role of this enzyme is to oxidize Mn²⁺ to Mn³⁺ (See e.g.,Glenn et al. Arch. Biochem. Biophys., 251:688-696 [1986]). Subsequently,phenolic substrates are oxidized by the Mn³⁺ generated.

Lignin peroxidases also find use in the mixtures of the presentinvention. Lignin peroxidase is an extracellular heme that catalyzes theoxidative depolymerization of dilute solutions of polymeric lignin invitro. Some of the substrates of LiP, most notably 3,4-dimethoxybenzylalcohol (veratryl alcohol, VA), are active redox compounds that havebeen shown to act as redox mediators. VA is a secondary metaboliteproduced at the same time as LiP by ligninolytic cultures of P.chrysosporium and without being bound by theory, has been proposed tofunction as a physiological redox mediator in the LiP-catalysedoxidation of lignin in vivo (See e.g., Harvey et al., FEBS Lett.,195:242-246 [1986]).

In some embodiments, it may be advantageous to utilize an enzyme mixturethat is cell-free. A cell-free enzyme mixture typically comprisesenzymes that have been separated from any cells, including the cellsthat secreted the enzymes. Cell-free enzyme mixtures can be preparedusing any of a variety of suitable methodologies that are known in theart (e.g., filtration or centrifugation). In some embodiments, theenzyme mixture is partially cell-free, substantially cell-free, orentirely cell-free.

In some embodiments, two or more cellulases and any additional enzymespresent in the cellulase enzyme mixture are secreted from a singlegenetically modified fungal cell or by different microbes in combined orseparate fermentations. Similarly, two or more cellulases and anyadditional enzymes present in the cellulase enzyme mixture may beexpressed individually or in sub-groups from different strains ofdifferent organisms and the enzymes combined in vitro to make thecellulase enzyme mixture. It is also contemplated that the cellulasesand any additional enzymes in the enzyme mixture are expressedindividually or in sub-groups from different strains of a singleorganism, and the enzymes combined to make the cellulase enzyme mixture.

In some embodiments, the enzyme mixture comprises at least one or morecellulose hydrolyzing enzymes expressed by a fungal cell that has beengenetically modified to reduce the amount of endogenous proteaseactivity that is secreted by the cell, as described herein. In someembodiments, the fungal cell is a lignocellulose-utilizing cell from thefamily Chaetomiaceae. In some embodiments, the genetically modifiedfungal cell provided herein is a Chaetomiaceae family member selectedfrom Myceliophthora, Thielavia, Corynascus, and Chaetomium. Thegenetically modified fungal cell can also be an anamorph or teleomorphof a Chaetomiaceae family member selected from Myceliophthora,Thielavia, Corynascus, and Chaetomium. In addition, the geneticallymodified fungal cell can also be selected from Sporotrichum, Acremonium,Ctenomyces, Scytalidium and Thermoascus, including anamorphs andteleomorphs of fungal cells from these genera. In some embodiments, thefungal cell is a species selected from Sporotrichum cellulophilum,Thielavia heterothallica, Corynascus heterothallicus, Thielaviaterrestris, Chaetomium globosum, Talaromyces stipitatus, Talaromycesemersonii, and Myceliophthora thermophila, including anamorphs andteleomorphs thereof.

In some embodiments, the cellulase enzyme mixture of the presentinvention is produced in a fermentation process in which the fungalcells described herein are grown in submerged liquid culturefermentation. In some embodiments, submerged liquid fermentations offungal cells are incubated using batch, fed-batch or continuousprocessing. In a batch process, all the necessary materials, with theexception of oxygen for aerobic processes, are placed in a reactor atthe start of the operation and the fermentation is allowed to proceeduntil completion, at which point the product is harvested. In someembodiments, batch processes for producing the enzyme mixture of thepresent invention are carried out in a shake-flask or a bioreactor. Insome embodiments in which a fed-batch process is used, the culture isfed continuously or sequentially with one or more media componentswithout the removal of the culture fluid. In continuous processes, freshmedium is supplied and culture fluid is removed continuously atvolumetrically equal rates to maintain the culture at a steady growthrate. Those of skill in the art will appreciate that fermentation mediumis typically liquid, and comprises a carbon source, a nitrogen source aswell as other nutrients, vitamins and minerals which can be added to thefermentation media to improve growth and enzyme production of the fungalcells. These other media components may be added prior to,simultaneously with or after inoculation of the culture with the fungalcells.

In some embodiments of the process for producing the enzyme mixture ofthe present invention, the carbon source comprises a carbohydrate thatwill induce the expression of the cellulase enzymes from the fungalcell. For example, in some embodiments, the carbon source comprises oneor more of cellulose, cellobiose, sophorose, xylan, xylose, xylobiose,and/or related oligo- or poly-saccharides known to induce expression ofcellulases and beta-glucosidase in such fungal cells. In someembodiments utilizing batch fermentation, the carbon source is added tothe fermentation medium prior to or simultaneously with inoculation. Insome embodiments utilizing fed-batch or continuous operations, thecarbon source is supplied continuously or intermittently during thefermentation process. For example, in some embodiments, the carbonsource is supplied at a carbon feed rate of between about 0.2 and about2.5 g carbon/L of culture/h, or any suitable amount therebetween.

The methods for producing and/or utilizing the enzyme mixture(s) of thepresent invention may be carried at any suitable temperature, typicallyfrom about 20° C. to about 100° C., or any suitable temperaturetherebetween, for example from about 20° C. to about 80° C., 25° C. toabout 65° C., or any suitable temperature therebetween, or from about20° C., about 22° C., about 25° C., about 26° C., about 27° C., about28° C., about 29° C., about 30° C., about 32° C., about 35° C., about37° C., about 40° C., about 45° C., about 50° C., about 55° C., about60° C., about 65° C., about 70° C., about 75° C., about 80° C., about85° C. C, about 90° C., about 95° C., and/or any suitable temperaturetherebetween.

The methods for producing and/or utilizing the enzyme mixture(s) of thepresent invention may be carried out at any suitable pH, typically fromabout 3.0 to 8.0, or any suitable pH therebetween, for example fromabout pH 3.5 to pH 6.8, or any suitable pH therebetween, for examplefrom about pH 3.0, about 3.2, about 3.4, about 3.5, about 3.7, about3.8, about 4.0, about 4.1, about 4.2, about 4.3, about 4.4, about 4.5,about 4.6, about 4.7, about 4.8, about 4.9, about 5.0, about 5.2, about5.4, about 5.5, about 5.7, about 5.8, about 6.0, about 6.2, about 6.5,about 6.8, about 7.0, about 7.2, about 7.5, about 8.0, or any suitablepH therebetween.

In some embodiments, following fermentation, the fermentation mediumcontaining the fungal cells is used, or the fermentation mediumcontaining the fungal cells and an exogenously supplied enzyme mixtureis used, or the enzyme mixture is separated from the fungal cells, forexample by filtration or centrifugation, and the enzyme mixture in thefermentation medium is used. In some embodiments, low molecular solutessuch as unconsumed components of the fermentation medium are removed byultrafiltration. In some embodiments, the enzyme mixture is concentratedby evaporation, precipitation, sedimentation, filtration, or anysuitable means. In some embodiments, chemicals such as glycerol,sucrose, sorbitol, etc., are added to stabilize the enzyme mixture. Insome embodiments, other chemicals, such as sodium benzoate or potassiumsorbate, are added to the enzyme mixture to prevent growth of microbialcontaminants.

The present invention also provides processes for generating glucose,comprising contacting cellulose with the enzyme mixture describedherein. For example, in some embodiments, the process comprisescontacting cellulose with an enzyme mixture comprising two or morecellulose hydrolyzing enzymes, wherein at least one of the two or morecellulose hydrolyzing enzymes is expressed by a fungal cell as describedherein. In some embodiments, the method for generating glucose fromcellulose using the enzyme mixture is batch hydrolysis, continuoushydrolysis, or a combination thereof. In some embodiments, thehydrolysis is agitated, unmixed, or a combination thereof.

Fermentation

In some embodiments, methods for generating sugar(s) described hereinfurther comprise fermentation of the resultant sugar(s) to an endproduct. Fermentation involves the conversion of a sugar source (e.g., asoluble sugar) to an end product through the use of a fermentingorganism. Any suitable organism finds use in the present invention,including bacterial and fungal organisms (e.g., yeast and filamentousfungi), suitable for producing a desired end product. Especiallysuitable fermenting organisms are able to ferment (i.e., convert),sugars, such as glucose, fructose, maltose, xylose, mannose and/orarabinose, directly or indirectly into a desired end product. Examplesof fermenting organisms include fungal organisms such as yeast. In someembodiments, yeast strains, including but not limited to the followinggenera find use: the genus Saccharomyces (e.g., S. cerevisiae and S.uvarum); Pichia (e.g., P. stipitis and P. pastoris); Candida (e.g., C.utilis, C. arabinofermentans, C. diddensii, C. sonorensis, C. shehatae,C. tropicalis, and C. boidinii). Other fermenting organisms include, butare not limited to strains of Zymomonas, Hansenula (e.g., H. polymorphaand H. anomala), Kluyveromyces (e.g., K. fragilis), andSchizosaccharomyces (e.g., S. pombe).

In some embodiments, the fermenting organisms are strains of Escherichia(e.g., E. coli), Zymomonas (e.g., Z. mobilis), Zymobacter (e.g., Z.palmae), Klebsiella (e.g., K. oxytoca), Leuconostoc (e.g., L.mesenteroides), Clostridium (e.g., C. butyricum), Enterobacter (e.g., E.aerogenes) and Thermoanaerobacter (e.g., Thermoanaerobacter BG1L1 [Seee.g., Georgieva and Ahring, Appl. Microbiol, Biotech., 77: 61-86] T.ethanolicus, T. thermosaccharolyticum, or T. mathranii), Lactobacillus,Corynebacterium glutamicum strain R, Bacillus thermoglucosidaisus, andGeobacillus thermoglucosidasius. It is not intended that the fermentingorganism be limited to these particular strains, as any suitableorganism finds use in the present invention.

The fermentation conditions depend on the desired fermentation productand can easily be determined by one of ordinary skill in the art. Insome embodiments involving ethanol fermentation by yeast, fermentationis typically ongoing for between about 1 hour to about 120 hours, orabout 12 to about 96 hours. In some embodiments, the fermentation iscarried out at a temperature between about 20° C. to about 40° C., orbetween about 26° C. and about 34° C., or about 32° C. In someembodiments, the fermentation pH is from about pH 3 to about pH 7, whilein some other embodiments, the pH is about 4 to about 6.

In some embodiments, enzymatic hydrolysis and fermentation are conductedin separate vessels, so that each biological reaction can occur underits respective optimal conditions (e.g., temperature). In some otherembodiments, the methods for producing glucose from cellulose areconducted simultaneously with fermentation in a simultaneoussaccharification and fermentation (i.e., “SSF”) reaction. In someembodiments, SSF is typically carried out at temperatures of about 28°C. to about 50° C., or about 30° C. to about 40° C., or about 35° C. toabout 38° C., which is a compromise between the about 50° C. optimum formost cellulase enzyme mixtures and the about 28° C. to about 30° C.optimum for most yeast.

In some embodiments, the methods for generating glucose further comprisefermentation of the glucose to a desired end product. It is not intendedthat the methods provided herein be limited to the production of anyspecific end product. In some embodiments, end products include fuelalcohols or precursor industrial chemicals. For example, in someembodiments, fermentation products include precursor industrialchemicals such as alcohols (e.g., ethanol, methanol and/or butanol);organic acids (e.g., butyric acid, citric acid, acetic acid, itaconicacid, lactic acid, and/or gluconic acid); ketones (e.g., acetone); aminoacids (e.g., glutamic acid); gases (e.g., H₂ and/or CO₂); antimicrobials(e.g., penicillin and/or tetracycline); enzymes; vitamins (e.g.,riboflavin, B₁₂, and/or beta-carotene); and/or hormones. In someembodiments, the end product is a fuel alcohol. Suitable fuel alcoholsare known in the art and include, but are not limited to lower alcoholssuch as methanol, ethanol, butanol and propyl alcohols.

EXPERIMENTAL

The present invention is described in further detail in the followingExamples, which are not in any way intended to limit the scope of theinvention as claimed.

In the experimental disclosure below, the following abbreviations apply:ppm (parts per million); M (molar); mM (millimolar), uM and μM(micromolar); nM (nanomolar); mol (moles); gm and g (gram); mg(milligrams); ug and μg (micrograms); L and 1 (liter); ml and mL(milliliter); cm (centimeters); mm (millimeters); um and μm(micrometers); sec. (seconds); min(s) (minute(s)); h(s) (hour(s)); U(units); MW (molecular weight); rpm (rotations per minute); ° C.(degrees Centigrade); wt % (weight percent); w.r.t. (with regard to);DNA (deoxyribonucleic acid); RNA (ribonucleic acid); gDNA (genomic DNA);cDNA (complementary DNA); HPLC (high pressure liquid chromatography); MS(mass spectroscopy); LC (liquid chromatography); LC/MS (liquidcharomatography/mass spectroscopy); LC/MS/MS (liquidchromatography/multi-stage mass spectroscopy); HMF(hydroxymethylfurfural); YPD (Yeast extract 10 g/L; Peptone 20 g/L;Dextrose 20 g/L); DCPIP (2,6-dichlorophenolindophenol); CV (columnvolume); NREL (National Renewable Energy Laboratory, Golden, Colo.); ARS(ARS Culture Collection or NRRL Culture Collection, Peoria, Ill.);Lallemand (Lallemand Ethanol Technology, Milwaukee, Wis.); Cayla(Cayla-InvivoGen, Toulouse, France); Agilent New Brunswick (NewBrunswick Scientific Co., Edison, N.J.); Agilent Technologies (AgilentTechnologies, Inc., Santa Clara, Calif.); Sigma (Sigma Aldrich, St.Louis, Mo.); Qiagen (Qiagen, Inc., Valencia, Calif.); Eppendorf(Eppendorf AG, Hamburg, Germany); GE Healthcare (GE Healthcare,Waukesha, Wis.); Bruker Optics (Bruker Optics, Inc., Billerica, Mass.);Specac (Specac, Inc., Cranston, R.I.); Invitrogen (Invitrogen, Corp.,Carlsbad, Calif.); Alphalyse (Alphalyse, Inc., Palo Alto, Calif.);Promega (Promega, Corp., Madison, Wis.); Sartorius (Sartorius-StedimBiotech, SA, Aubagne, France); Finnzymes (Finnzymes Oy, Espoo, FI [partof Thermo Fisher Scientific]), CalBiochem (CalBiochem, EMD Chemicals,Inc., Gibbstown, N.J.); and Bio-Rad (Bio-Rad Laboratories, Hercules,Calif.).

Genomic, cDNA, and amino acid sequences of the three proteases of thepresent invention including Protease #1 (SEQ ID NOS:1-3), Protease #2(SEQ ID NOS:4-6), Protease #3 (SEQ ID NOS:7-9) and Protease #4 (SEQ IDNOS:10-12) are provided below. Protease #1 comprises contig_1809,Protease #2 comprises contig_690, and Protease #3 comprises contig_1086as described in the Examples.

Protease #1:

gDNA:

(SEQ ID NO: 1) ATGCAGCTCCTTAGTCTCGCCGCTCTCCTCCCCCTTGCCCTTGCGGCACCGGTGATCAAGCCTCAGGGGCTCCAGCTGATTCCGGGCGACTACATCGTGAAGCTGAAGGACGGTGCGTCCGAGAGCACTCTCCAGGACACCATCCGGCACCTCCAGGCAGGCGAGGCCAAGCATGTCTACCGCGCACGCCGGTTCAAGGGCTTCGCGGCCAAGCTGAGCCCGCAGGTGGTCGATACCCTGAGCAAGCTGCCCGAGGTTCGTTCGTCGTCTCATGTGTAATTATGTCACAAAAAGGGATATGTAGGATGCTAATTCAGACCCGCAGGTCGAATACATTGAGCAGGACGCCGTCGTCACCATCCAGGCGCTGGTCACCCAGGAGGACGTGCCCTGGGGTCTGGCCCGCATCTCGCACCACGAACTGGGTCCCACGTCGTACGTATACGACGACAGCGCCGGCGAGGGTACCTGCGCCTATGTCATCGACACGGGCATCTATGTGGCCCACTCTGTAAGTCTGGCCGTCAATTCACCCACTCTCCCGCTGCTGCCACCGAATCTCTATTAGTATCTTGACGACTTTGTTGCGGAGACAACGACGCTGACTCTTTTGACTCCAGCAGTTCGAAGGCCGCGCGACGTGGCTGGCCAACTTTATCGACAGCAGCGATAGCGAGTCAGTTTAGCATCCCCCACCCCCTGGTTGTTGCACTTGAATGAGCTGACCTTTCATAAATAAACAGCGGCGCGGGCCACGGCACGCACGTGTCGGGCACGATCGGCGGCGTGACGTACGGCGTGGCCAAGAAGACCAAGCTGTTCGCGGTCAAGGTGCTCAACGCGAGCGGGTCGGGGACGGTGTCGTCGGTGCTGGCGGGGCTCGAGTTCGTCGCGTCGGACGCGCCGGCGCGCGTCGCCTCGGGCGAGTGCGCCAACGGCGCGGTCGCCAACCTGAGCCTCGGCGGCGGCCGGTCCACCGCCATCAACGCCGCCGCCGCCGCCGCCGTCGACGCGGGCGTCTTCGTCGCCGTCGCGGCCGGCAACAGCAACACCGACGCCCAGTCCACCTCCCCCGCCAGCGAGCCCAGCGTCTGCACCGTCGGCGCCACCGACGACAGCGACGCCCGCGCCTACTTCTCCAACTACGGCAGCGTCGTCGACGTCTTTGCTCCCGGCGTCGACGTCCTCAGCAGCTGGATCGGCGGTGTCGATGCCACTGTGAGTTTTTTTTTTTCCTTTTCCCGTTTCTTTTTGCTTCTTGTTTTCTCCCCATTTTGATGTTTTACATTACTTTCCTTCTTCGTTGGCCGGATTCGTTTTCATCCTTTTTTTCTTCTTTCTTCTGTCAAAAGGCGATAACAAGGGATGATGCGGAAAGAGAGAAGAGGAATAAAAACGGGGAACCAGAACAAGAACATACCAGGCTGACTGGAAAACAAACAGAACACCATCTCGGGCACCTCGATGGCGACCCCGCATATCGCCGGCCTCGGGGCCTATCTCCTCGCTCTGCTGGGCCCCAGGTCGCCCGAGGAACTGTGCGAGTACATCAAGCAGACGGCCACCATCGGCACCATCACCAGCCTCCCCAGCGGCACCATCAACGCCATTGCCTACAACGGTGCTACAGCCTAA

cDNA:

(SEQ ID NO: 2) ATGCAGCTCCTTAGTCTCGCCGCTCTCCTCCCCCTTGCCCTTGCGGCACCGGTGATCAAGCCTCAGGGGCTCCAGCTGATTCCGGGCGACTACATCGTGAAGCTGAAGGACGGTGCGTCCGAGAGCACTCTCCAGGACACCATCCGGCACCTCCAGGCAGGCGAGGCCAAGCATGTCTACCGCGCACGCCGGTTCAAGGGCTTCGCGGCCAAGCTGAGCCCGCAGGTGGTCGATACCCTGAGCAAGCTGCCCGAGGTCGAATACATTGAGCAGGACGCCGTCGTCACCATCCAGGCGCTGGTCACCCAGGAGGACGTGCCCTGGGGTCTGGCCCGCATCTCGCACCACGAACTGGGTCCCACGTCGTACGTATACGACGACAGCGCCGGCGAGGGTACCTGCGCCTATGTCATCGACACGGGCATCTATGTGGCCCACTCTCAGTTCGAAGGCCGCGCGACGTGGCTGGCCAACTTTATCGACAGCAGCGATAGCGACGGCGCGGGCCACGGCACGCACGTGTCGGGCACGATCGGCGGCGTGACGTACGGCGTGGCCAAGAAGACCAAGCTGTTCGCGGTCAAGGTGCTCAACGCGAGCGGGTCGGGGACGGTGTCGTCGGTGCTGGCGGGGCTCGAGTTCGTCGCGTCGGACGCGCCGGCGCGCGTCGCCTCGGGCGAGTGCGCCAACGGCGCGGTCGCCAACCTGAGCCTCGGCGGCGGCCGGTCCACCGCCATCAACGCCGCCGCCGCCGCCGCCGTCGACGCGGGCGTCTTCGTCGCCGTCGCGGCCGGCAACAGCAACACCGACGCCCAGTCCACCTCCCCCGCCAGCGAGCCCAGCGTCTGCACCGTCGGCGCCACCGACGACAGCGACGCCCGCGCCTACTTCTCCAACTACGGCAGCGTCGTCGACGTCTTTGCTCCCGGCGTCGACGTCCTCAGCAGCTGGATCGGCGGTGTCGATGCCACTAACACCATCTCGGGCACCTCGATGGCGACCCCGCATATCGCCGGCCTCGGGGCCTATCTCCTCGCTCTGCTGGGCCCCAGGTCGCCCGAGGAACTGTGCGAGTACATCAAGCAGACGGCCACCATCGGCACCATCACCAGCCTCCCCAGCGGCACCATCAACGCCATTGCCTACAACG GTGCTACAGCCTAA

Polypeptide:

(SEQ ID NO: 3) MQLLSLAALLPLALAAPVIKPQGLQLIPGDYIVKLKDGASESTLQDTIRHLQAGEAKHVYRARRFKGFAAKLSPQVVDTLSKLPEVEYIEQDAVVTIQALVTQEDVPWGLARISHHELGPTSYVYDDSAGEGTCAYVIDTGIYVAHSQFEGRATWLANFIDSSDSDGAGHGTHVSGTIGGVTYGVAKKTKLFAVKVLNASGSGTVSSVLAGLEFVASDAPARVASGECANGAVANLSLGGGRSTAINAAAAAAVDAGVFVAVAAGNSNTDAQSTSPASEPSVCTVGATDDSDARAYFSNYGSVVDVFAPGVDVLSSWIGGVDATNTISGTSMATPHIAGLGAYLLALLGPRSPEELCEYIKQTATIGTITSLPSGTINAIAYNGATA

Protease #2:

gDNA:

(SEQ ID NO: 4) ATGAGGTTACTCCGCACCGCGGGAGCGGCAACTCTCTTCCTGTCGCCCGCCACTTTTGCGACCAACAACCCTCTGACCCCAGGCAAACTTGAGGCGGACATTAGAACCGAAGAGTATGAGAAGACAACAGTGCCAAACCTTTGATCCCTCTCATTCGTTAACGAATATTGCCAAACCAGGTTGCAAAATGTCCTCTGGAACCTCAATCACATTGCGGTCACCCACGGCGGCAACCGAGCCTTTGGCGAGCCTGGGTACAAAGCCTCGCTCGACTTTATTCTCGAGCGCGCCCAGACACGCTTCCACAATGAGTTTGACACTGTCGTTCAGCCCTTCAACCACACCTACGGCAAGACGAACCAGATCAAGGTGACTGGACCAGAGGGCGAGGATGTCTTTGTCATCAGCCCATTGTACAATCCCGCCACGCCGCTGCCTGATGGTATCACCGCTCCCTTGGTAGATACACCGGTCGATGACGAGCGCGGATCGGCGTGCTTTCCGGACCAGTGGGAGGGGGTCGATGTGAAGGGGAAGCTGGTACTAGTAAAGAGAGGCATTTGTGCTGTGGCAGATAAGTCGGCCCTTGCTAAGGAGCGCGGGGCACTGGGTGAGCTACGTCCTGGCTGACGGGGGAAGCAAACGTTGACGTCGCTCTAGGGGTGATCTTGTATAACGAACAGCCGGGTACGAACATCGTCGTCCCGACTCTGGGTGCAGAGAGCATCGGCAAGACTGTTCCTATCGGAATTATTCCCTTGGAAGTAGGACAGAGCTGGAAGTCCCGGTTGGCAGATGGCGAGGAGGTGACTGTGCACCTGCTGGTCGATTCCATATCCGATACGCGCGAGACGTGGAACATTATTGCCGAGACCAAACAGGGCGACCCCGACAAAGTTATCATGCTCGGTGCACATCTCGACAGCGTGCAGGCGGGAGCAGGCATCAATGACGACGGCAGCGGCACGGCAGCTCTCCTGGAGATCTTGACCGCGGTCCGGCGCTACGATGGATTCCCACATAAGATTCGGTTCGCCTGGTGGGCAGCAGAAGAGAGTGGTCTGGTCGGATCCCTCTACTACACCTCCCACTTGACCGAGGAGGAAGCCGACCGCATCAAGTATTACTTCAACTACGACATGATTGGCTCTCCCCATCCCGACTTTGAAATTGCAAGCGATGGCAACAGCGGAGTCGGGCCGCAGCTTCTGGAGGAATACCTCGTCGAGCAGGGGAAGGAGATTGTCCACGGGTAAGTAGATCCCACTCCAGCTCCACATCTATTTTGCGTACCTGGTACCTCTATGATATGTGCAGGTTCCGCTGACCTTGGGATGCAAGCGGCTTCGGTTCTGGCTCCGATTTTGTGGGCTTCCTCGAGCTTGGCATCCCGAGTACCGCGCTACATACCGGTGCAGGAGCTCCATTCGACGAATGCTACCACCAGGCGTGTGATGACCTCGACAATATCAACTGGGAGGCGCTGACCGTCAATGCCAAAGCGGCCGCTCGGGCGGCTGCCCGGCTGGCCAACTCGCTCGAGGGCGTGCCGCCCCGCAAGAAAACTAGCCTGAATCTTCACACGCGCCGTGGAGTGGTGCAAAACTTCCGAAAGTGGGCTTCATTGGCCGAGGAAGCGAGCCACGGGCACACGTGCTCGCACACGGGAAAGAGGGTCGTAGTGTAAcDNA:

(SEQ ID NO: 5) ATGAGGTTACTCCGCACCGCGGGAGCGGCAACTCTCTTCCTGTCGCCCGCCACTTTTGCGACCAACAACCCTCTGACCCCAGGCAAACTTGAGGCGGACATTAGAACCGAAGAGTTGCAAAATGTCCTCTGGAACCTCAATCACATTGCGGTCACCCACGGCGGCAACCGAGCCTTTGGCGAGCCTGGGTACAAAGCCTCGCTCGACTTTATTCTCGAGCGCGCCCAGACACGCTTCCACAATGAGTTTGACACTGTCGTTCAGCCCTTCAACCACACCTACGGCAAGACGAACCAGATCAAGGTGACTGGACCAGAGGGCGAGGATGTCTTTGTCATCAGCCCATTGTACAATCCCGCCACGCCGCTGCCTGATGGTATCACCGCTCCCTTGGTAGATACACCGGTCGATGACGAGCGCGGATCGGCGTGCTTTCCGGACCAGTGGGAGGGGGTCGATGTGAAGGGGAAGCTGGTACTAGTAAAGAGAGGCATTTGTGCTGTGGCAGATAAGTCGGCCCTTGCTAAGGAGCGCGGGGCACTGGGGGTGATCTTGTATAACGAACAGCCGGGTACGAACATCGTCGTCCCGACTCTGGGTGCAGAGAGCATCGGCAAGACTGTTCCTATCGGAATTATTCCCTTGGAAGTAGGACAGAGCTGGAAGTCCCGGTTGGCAGATGGCGAGGAGGTGACTGTGCACCTGCTGGTCGATTCCATATCCGATACGCGCGAGACGTGGAACATTATTGCCGAGACCAAACAGGGCGACCCCGACAAAGTTATCATGCTCGGTGCACATCTCGACAGCGTGCAGGCGGGAGCAGGCATCAATGACGACGGCAGCGGCACGGCAGCTCTCCTGGAGATCTTGACCGCGGTCCGGCGCTACGATGGATTCCCACATAAGATTCGGTTCGCCTGGTGGGCAGCAGAAGAGAGTGGTCTGGTCGGATCCCTCTACTACACCTCCCACTTGACCGAGGAGGAAGCCGACCGCATCAAGTATTACTTCAACTACGACATGATTGGCTCTCCCCATCCCGACTTTGAAATTGCAAGCGATGGCAACAGCGGAGTCGGGCCGCAGCTTCTGGAGGAATACCTCGTCGAGCAGGGGAAGGAGATTGTCCACGGCGGCTTCGGTTCTGGCTCCGATTTTGTGGGCTTCCTCGAGCTTGGCATCCCGAGTACCGCGCTACATACCGGTGCAGGAGCTCCATTCGACGAATGCTACCACCAGGCGTGTGATGACCTCGACAATATCAACTGGGAGGCGCTGACCGTCAATGCCAAAGCGGCCGCTCGGGCGGCTGCCCGGCTGGCCAACTCGCTCGAGGGCGTGCCGCCCCGCAAGAAAACTAGCCTGAATCTTCACACGCGCCGTGGAGTGGTGCAAAACTTCCGAAAGTGGGCTTCATTGGCCGAGGAAGCGAGCCACGGGCACACGTGCTCGCACACGGGAAAGAGGGTCGTAGTGTAA

Polypeptide:

(SEQ ID NO: 6) MRLLRTAGAATLFLSPATFATNNPLTPGKLEADIRTEELQNVLWNLNHIAVTHGGNRAFGEPGYKASLDFILERAQTRFHNEFDTVVQPFNHTYGKTNQIKVTGPEGEDVFVISPLYNPATPLPDGITAPLVDTPVDDERGSACFPDQWEGVDVKGKLVLVKRGICAVADKSALAKERGALGVILYNEQPGTNIVVPTLGAESIGKTVPIGIIPLEVGQSWKSRLADGEEVTVHLLVDSISDTRETWNIIAETKQGDPDKVIMLGAHLDSVQAGAGINDDGSGTAALLEILTAVRRYDGFPHKIRFAWWAAEESGLVGSLYYTSHLTEEEADRIKYYFNYDMIGSPHPDFEIASDGNSGVGPQLLEEYLVEQGKEIVHGGFGSGSDFVGFLELGIPSTALHTGAGAPFDECYHQACDDLDNINWEALTVNAKAAARAAARLANSLEGVPPRKKTSLNLHTRRGVVQNFRKWASLAEEASHGHTCSHTGKRVVV

Protease #3:

gDNA:

(SEQ ID NO: 7) ATGTGTTGGCTGTGGGAGCGATCAGTGGCAATATTACTGGCGGCCGGCGTGATCGCCAACCCGCTCCGCCCGCGCCGGATCCCCTGGCCGGAGCCGGTTCCGGCATCTTCCATCGGGCCCATTGACTGGTCTTCAATACCGCCTTCTCCCTACAAACACGCCTTGCGGCAGACCAACACCACCACGACCAGCAGCAGTAGCAGCAGCAGCAGCAGCAAATATGACAATCAAGTCTACTCGGTACAGGTCTCGGGATCTTCCTCCTCCCCGCCAGCATCCGTCGACTGGCGCAACCGCGACGGCCAGAACTACATCACGACACCGCAGGACCAGGGCGCCTGTAACAGCTGCTGGGCGTTCGCCGTGGCGGCGCTGATCGAGTCCATGATGCGCATCGAGCACGGGGTCTGGGGCAAGCGCAGCGAGGCCGACGTGCACGACGGGGTGGGCGCGGCGTGCGAGAGCGTGGGCAACGCCGAGGACACGCTGGCCTGGGTGGCCGGGCAGGGGCCCGAATTCGTCGCCGACCCGACCCGGCCCGCCCCGGGCATCGCCGACTGGGCCTGCGACCCCTACGAGGCGACGGCGCACGCCTACGAGCACTGCGACGACCGCTCCGGGCGCACGACGCACATTCCCTACTACCAGGCCCTCGGCCTGGTCGAGGACCAGAAGCGGTGGCTGGACGAGTACGGGCCCATCATCGCCACCTTTGTCCTCTACGACGACTTTGGCTCGTGGAAGCCGACCGCGGCCGGCGGAAGCGGCGGTGACGTGTACCGGTGGGACGGCGTTTCCGGCTCGGACGGCAACCACCTCGCCATCGTGATCGGCTACGACGACGAGAAGCAGGCCTGGCTTATGAAGAACTCATGGGGATCCGGATGGGGGGACGAGGGATTTGTCTACTTTGCGTAAGTCAGGGGTTCCACTGCTTTTTTTTTTTTCCCCTCCAAAATCGTTTGCCTCTCGGTAATTTTATCCGCATCCAGGGAACTGACAACAGATACAGGTACGGCGAGGCCAACATCGACAACTGGACCAAGTATGGGCTCGTCAATGTCAACCCGGACCCGTGGACACGCAGGAAGCACCAGAGCGGAAGCATGATGCAATCCGGCAACGGCGAGACGCACCGAAACTTTGAGCTGCTCGTCAGCGAGGCCGGGGGTTCCGGCTTCACGCACGTCTCCCGCGATGGGAACAGTACCCAATGGAGCAAGGTGCTGGAGGTCTCGGGCAGCGGCAGCGGCAGCGGCCTCGTGGGCCAGCCTGCCATTCTCGGCACCTCCTTCAACCGGGACTTCCACGCGGTGAGCCTGGATGAGAACCAGGTGGTCCAACAGTGGGCATACAGACAGTCGGAGATGCGCTGGTCCCGGGTCTCGGCCATCGAGGGCACTAAGATCGACGGCTTTCCCGGTCTCGCCCAGAGCGACGGCTCAACTCTGGTCATGGTGGTCAAGCACGCCGACGGCACCCTGAACGAGGTAAGCATATCTTGCCGGAAGTCATAATTAACGAAGGAAGATCTTCCGTAAAAGAAAAGGAAAAGATGAAAAAAAAAAGGTACACGTGCTAACGGCGGATCGCACAAGTGGCAACAAGCACCCAACAGCACAACCTGGACCCTGGCCAACTCACCCATCGCAAGCGGCATCGCCCAGAGCGGGCCGGCGCTCGTGCAGTCCAACGCCGGACTCAACCTCTACGACCGGCAGCAGGGCGCCTCGCGGGGCAACATCTACACCGTCGCGGTCCGCGAGGACGGCAAGCTGCAGCTCTTCTGGCGCCCCGGCGCGGACGCGGCCGGGTGGTCGGCCGGGGAGGTGTTCGGCGGCTCCGGCGTCGTGGACCCCGGCTCGCCGCCCGTCATGATTCAGGACTACTCGGGGACGGCCAACGAGACGAGCGTCGGCCGGTTCCAGCTGGCCGTCGCCGTCGGGGGGAGCGTCCAACACTGGGAGCGGGCCAACGACGACCTCGAGGCCGGGCAGGCCCCGCCCGCGGGGGCAGAAGGGGGGTCCCCGGCGGGCAGGTGGGAACTGGTCGAGACGGCGGGCACCGGGGTGAAGCGCGTCTGGGCGCTGCTCCAGGGGAGCTTTGGTGGGAGGCTGCACATGATCACGGAGGGCACGGACGGCCGGCTGTCGTACTGGGAGCGCGATGAGAAGTGGGTTGAGGTCGAGAAGCTGCCGGCGTTGAGCGACGCCGCTTGGACGAGATCGGGCCCGGTGAGTGGTGGTTGAGGGTAGTCCCAAGTACCTGATTATAATTATATGAAAGAGATGTCCCCCGAATAATTATATGAGTGAACCAACGACCATGAAGACATGCGGCTTTATCAGCATACCGACGCGACTTGTCCTGGTTGCATCTGCTACGACCCCTGATTAATTACAACACCGCACAGCGGCAGAGACGGGGCCAGAAGCTGCACATAGAAAGAAGGCTGGACAACTTCCCCGAGACGCTATAAcDNA:

(SEQ ID NO: 8) ATGTGTTGGC TGTGGGAGCG ATCAGTGGCA ATATTACTGG CGGCCGGCGTGATCGCCAAC CCGCTCCGCC CGCGCCGGAT CCCCTGGCCG GAGCCGGTTC CGGCATCTTCCATCGGGCCC ATTGACTGGT CTTCAATACC GCCTTCTCCC TACAAACACG CCTTGCGGCAGACCAACACC ACCACGACCA GCAGCAGTAG CAGCAGCAGCAGCAGCAAATATGACAATCAAGTCTACTCG GTACAGGTCT CGGGATCTTC CTCCTCCCCGCCAGCATCCG TCGACTGGCG CAACCGCGAC GGCCAGAACT ACATCACGAC ACCGCAGGACCAGGGCGCCT GTAACAGCTGCTGGGCGTTC GCCGTGGCGG CGCTGATCGA GTCCATGATGCGCATCGAGC ACGGGGTCTGGGGCAAGCGC AGCGAGGCCG ACGTGCACGACGGGGTGGGCGCGGCGTGCGAGAGCGTGGGCAACGCCGAG GACACGCTGG CCTGGGTGGCCGGGCAGGGG CCCGAATTCG TCGCCGACCCGACCCGGCCC GCCCCGGGCA TCGCCGACTGGGCCTGCGAC CCCTACGAGG CGACGGCGCACGCCTACGAG CACTGCGACG ACCGCTCCGGGCGCACGACG CACATTCCCT ACTACCAGGC CCTCGGCCTG GTCGAGGACC AGAAGCGGTGGCTGGACGAG TACGGGCCCA TCATCGCCAC CTTTGTCCTC TACGACGACT TTGGCTCGTGGAAGCCGACC GCGGCCGGCG GAAGCGGCGGTGACGTGTAC CGGTGGGACG GCGTTTCCGGCTCGGACGGC AACCACCTCG CCATCGTGAT CGGCTACGAC GACGAGAAGC AGGCCTGGCTTATGAAGAACTCATGGGGATCC GGATGGGGGGACGAGGGA TTTGTCTACT TTGCGTACGG CGAGGCCAAC ATCGACAACT GGACCAAGTA TGGGCTCGTC AATGTC AACC CGGACCCGTGGACACGCAGG AAGCACCAGAGCGG AAGCATGATGCAATCCGGCAACGGCG AGACGCACCG AAACTTTGAG CTGCTCGTCA GCGAGGCCGGGGGTTCCGGCTTCACGCACG TCTCCCGCGA TGGGAACAGTACCCAATGGA GCAAGGTGCT GGAGGTCTCG GGCAGCGGCA GCGGCAGCGG CCTCGTGGGCCAGCCTGCCA TTCTCGGCAC CTCCTTCAAC CGGGACTTCC ACGCGGTGAG CCTGGATGAGAACCAGGTGG TCCAACAGTGGCATACAGA CAGTCGGAGA TGCGCTGGTC CCGGGTCTCGGCCATCGAGG GCACTAAGAT CGACGGCTTT CCCGGTCTCG CCCAGAGCGA CGGCTCAACTCTGGTCATGG TGGTCAAGCA CGCCGACGGC ACCCTGAACG AGTGGCAACA AGCACCCAACAGCACAACCT GGACCCTGGCCAACTCACCC ATCGCAAGCG GCATCGCCCA GAGCGGGCCGGCGCTCGTGC AGTCCAACGCGGACTCAAC CTCTACGACC GGCAGCAGGG CGCCTCGCGGGGCAACATCT ACACCGTCGC GGTCCGCGAG GACGGCAAGC TGCAGCTCTT CTGGCGCCCCGGCGCGGACG CGGCCGGGTGGTCGGCCGGG GAGGTGTTCG GCGGCTCCGG CGTCGTGGACCCCGGCTCGC CGCCCGTCAT GATTCAGGAC TACTCGGGGA CGGCCAACGA GACGAGCGTCGGCCGGTTCC AGCTGGCCGTCGCCGTCGGG GGGAGCGTCC AACACTGGGA GCGGGCCAACGACGACCTCGAGGCCGGGCAGGCCCCGCCC GCGGGGGCAG AAGGGGGGTCCCGGCGGGCAGGTGGGAACTG GTCGAGACGGCGGGCACCGGGGTGAAGC GCGTCTGGGC GCTGCTCCAGGGGAGCTTTG GTGGGAGGCT GCACATGATC ACGGAGGGCA CGGACGGCCG GCTGTCGTACTGGGAGCGC GATGAGAAGTGGGTTGAGGTCGAGAAGCTGC CGGCGTTGAG CGACGCCGCTTGGACGAGAT CGGGCCCGGTGAGTGGTGGT TGA

Polypeptide:

(SEQ ID NO: 9) MCWLWERSVA ILLAAGVIAN PLRPRRIPWP EPVPASSIGP IDWSSIPPSPYKHALRQTNT TTTSSSSSSS SSKYDNQVYS VQVSGSSSSP PASVDWRNRD GQNYITTPQDQGACNSCWAF AVAALIESMM RIEHGVWGKR SEADVHDGVG AACESVGNAE DTLAWVAGQGPEFVADPTRP APGIADWACD PYEATAHAYE HCDDRSGRTT HIPYYQALGL VEDQKRWLDEYGPIIATFVL YDDFGSWKPT AAGGSGGDVY RWDGVSGSDGNHLAIVIGYDDEKQAWLMKNSWGSGWGDEGFVYFAYGEAN IDNWTKYGLV NVNPDPWTRR KHQSGSMMQSGNGETHRNFE LLVSEAGGSG FTHVSRDGNS TQWSKVLEVS GSGSGSGLVG QPAILGTSFNRDFHAVSLDE NQVVQQWAYRQSEMRWSRVS AIEGTKIDGF PGLAQSDGST LVMVVKHADGTLNEWQQAPN STTWTLANSP IASGIAQSGP ALVQSNAGLN LYDRQQGASR GNIYTVAVREDGKLQLFWRP GADAAGWSAG EVFGGSGVVD PGSPPVMIQD YSGTANETSV GRFQLAVAVGGSVQHWERAN DDLEAGQAPP AGAEGGSPAG RWELVETAGT GVKRVWALLQ GSFGGRLHMITEGTDGRLSY WERDEKWVEV EKLPALSDAA WTRSGPVSGG

Protease #4:

gDNA:

(SEQ ID NO: 10) ATGTGTTGGCTGTGGGAGCGATCAGTGGCAATATTACTGGCGGCCGGCGTGATCGCCAACCCGCTCCGCCCGCGCCGGATCCCCTGGCCGGAGCCGGTTCCGGCATCTTCCATCGGGCCCATTGACTGGTCTTCAATACCGCCTTCTCCCTACAAACACGCCTTGCGGCAGACCAACACCACCACGACCAGCAGCAGTAGCAGCAGCAGCAGCAGCAAATATGACAATCAAGTCTACTCGGTACAGGTCTCGGGATCTTCCTCCTCCCCGCCAGCATCCGTCGACTGGCGCAACCGCGACGGCCAGAACTACATCACGACACCGCAGGACCAGGGCGCCTGTAACAGCTGCTGGGCGTTCGCCGTGGCGGCGCTGATCGAGTCCATGATGCGCATCGAGCACGGGGTCTGGGGCAAGCGCAGCGAGGCCGACGTGCACGACGGGGTGGGCGCGGCGTGCGAGAGCGTGGGCAACGCCGAGGACACGCTGGCCTGGGTGGCCGGGCAGGGGCCCGAATTCGTCGCCGACCCGACCCGGCCCGCCCCGGGCATCGCCGACTGGGCCTGCGACCCCTACGAGGCGACGGCGCACGCCTACGAGCACTGCGACGACCGCTCCGGGCGCACGACGCACATTCCCTACTACCAGGCCCTCGGCCTGGTCGAGGACCAGAAGCGGTGGCTGGACGAGTACGGGCCCATCATCGCCACCTTTGTCCTCTACGACGACTTTGGCTCGTGGAAGCCGACCGCGGCCGGCGGAAGCGGCGGTGACGTGTACCGGTGGGACGGCGTTTCCGGCTCGGACGGCAACCACCTCGCCATCGTGATCGGCTACGACGACGAGAAGCAGGCCTGGCTTATGAAGAACTCATGGGGATCCGGATGGGGGGACGAGGGATTTGTCTACTTTGCGTAAGTCAGGGGTTCCACTGCTTTTTTTTTTTTCCCCTCCAAAATCGTTTGCCTCTCGGTAATTTTATCCGCATCCAGGGAACTGACAACAGATACAGGTACGGCGAGGCCAACATCGACAACTGGACCAAGTATGGGCTCGTCAATGTCAACCCGGACCCGTGGACACGCAGGAAGCACCAGAGCGGAAGCATGATGCAATCCGGCAACGGCGAGACGCACCGAAACTTTGAGCTGCTCGTCAGCGAGGCCGGGGGTTCCGGCTTCACGCACGTCTCCCGCGATGGGAACAGTACCCAATGGAGCAAGGTGCTGGAGGTCTCGGGCAGCGGCAGCGGCAGCGGCCTCGTGGGCCAGCCTGCCATTCTCGGCACCTCCTTCAACCGGGACTTCCACGCGGTGAGCCTGGATGAGAACCAGGTGGTCCAACAGTGGGCATACAGACAGTCGGAGATGCGCTGGTCCCGGGTCTCGGCCATCGAGGGCACTAAGATCGACGGCTTTCCCGGTCTCGCCCAGAGCGACGGCTCAACTCTGGTCATGGTGGTCAAGCACGCCGACGGCACCCTGAACGAGGTAAGCATATCTTGCCGGAAGTCATAATTAACGAAGGAAGATCTTCCGTAAAAGAAAAGGAAAAGATGAAAAAAAAAAGGTACACGTGCTAACGGCGGATCGCACAAGTGGCAACAAGCACCCAACAGCACAACCTGGACCCTGGCCAACTCACCCATCGCAAGCGGCATCGCCCAGAGCGGGCCGGCGCTCGTGCAGTCCAACGCCGGACTCAACCTCTACGACCGGCAGCAGGGCGCCTCGCGGGGCAACATCTACACCGTCGCGGTCCGCGAGGACGGCAAGCTGCAGCTCTTCTGGCGCCCCGGCGCGGACGCGGCCGGGTGGTCGGCCGGGGAGGTGTTCGGCGGCTCCGGCGTCGTGGACCCCGGCTCGCCGCCCGTCATGATTCAGGACTACTCGGGGACGGCCAACGAGACGAGCGTCGGCCGGTTCCAGCTGGCCGTCGCCGTCGGGGGGAGCGTCCAACACTGGGAGCGGGCCAACGACGACCTCGAGGCCGGGCAGGCCCCGCCCGCGGGGGCAGAAGGGGGGTCCCCGGCGGGCAGGTGGGAACTGGTCGAGACGGCGGGCACCGGGGTGAAGCGCGTCTGGGCGCTGCTCCAGGGGAGCTTTGGTGGGAGGCTGCACATGATCACGGAGGGCACGGACGGCCGGCTGTCGTACTGGGAGCGCGATGAGAAGTGGGTTGAGGTCGAGAAGCTGCCGGCGTTGAGCGACGCCGCTTGGACGAGATCGGGCCCGGTGAGTGGTGGTTGAGGGTAGTCCCAAGTACCTGATTATAATTATATGAAAGAGATGTCCCCCGAATAATTATATGAGTGAACCAACGACCATGAAGACATGCGGCTTTATCAGCATACCGACGCGACTTGTCCTGGTTGCATCTGCTACGACCCCTGATTAATTACAACACCGCACAGCGGCAGAGACGGGGCCAGAAGCTGCACATAGAAAGAAGGCTGGACAACTTCCCCGAGACGCTA

cDNA:

(SEQ ID NO: 11) ATGTGTTGGCTGTGGGAGCGATCAGTGGCAATATTACTGGCGGCCGGCGTGATCGCCAACCCGCTCCGCCCGCGCCGGATCCCCTGGCCGGAGCCGGTTCCGGCATCTTCCATCGGGCCCATTGACTGGTCTTCAATACCGCCTTCTCCCTACAAACACGCCTTGCGGCAGACCAACACCACCACGACCAGCAGCAGTAGCAGCAGCAGCAGCAGCAAATATGACAATCAAGTCTACTCGGTACAGGTCTCGGGATCTTCCTCCTCCCCGCCAGCATCCGTCGACTGGCGCAACCGCGACGGCCAGAACTACATCACGACACCGCAGGACCAGGGCGCCTGTAACAGCTGCTGGGCGTTCGCCGTGGCGGCGCTGATCGAGTCCATGATGCGCATCGAGCACGGGGTCTGGGGCAAGCGCAGCGAGGCCGACGTGCACGACGGGGTGGGCGCGGCGTGCGAGAGCGTGGGCAACGCCGAGGACACGCTGGCCTGGGTGGCCGGGCAGGGGCCCGAATTCGTCGCCGACCCGACCCGGCCCGCCCCGGGCATCGCCGACTGGGCCTGCGACCCCTACGAGGCGACGGCGCACGCCTACGAGCACTGCGACGACCGCTCCGGGCGCACGACGCACATTCCCTACTACCAGGCCCTCGGCCTGGTCGAGGACCAGAAGCGGTGGCTGGACGAGTACGGGCCCATCATCGCCACCTTTGTCCTCTACGACGACTTTGGCTCGTGGAAGCCGACCGCGGCCGGCGGAAGCGGCGGTGACGTGTACCGGTGGGACGGCGTTTCCGGCTCGGACGGCAACCACCTCGCCATCGTGATCGGCTACGACGACGAGAAGCAGGCCTGGCTTATGAAGAACTCATGGGGATCCGGATGGGGGGACGAGGGATTTGTCTACTTTGCGTACGGCGAGGCCAACATCGACAACTGGACCAAGTATGGGCTCGTCAATGTCAACCCGGACCCGTGGACACGCAGGAAGCACCAGAGCGGAAGCATGATGCAATCCGGCAACGGCGAGACGCACCGAAACTTTGAGCTGCTCGTCAGCGAGGCCGGGGGTTCCGGCTTCACGCACGTCTCCCGCGATGGGAACAGTACCCAATGGAGCAAGGTGCTGGAGGTCTCGGGCAGCGGCAGCGGCAGCGGCCTCGTGGGCCAGCCTGCCATTCTCGGCACCTCCTTCAACCGGGACTTCCACGCGGTGAGCCTGGATGAGAACCAGGTGGTCCAACAGTGGGCATACAGACAGTCGGAGATGCGCTGGTCCCGGGTCTCGGCCATCGAGGGCACTAAGATCGACGGCTTTCCCGGTCTCGCCCAGAGCGACGGCTCAACTCTGGTCATGGTGGTCAAGCACGCCGACGGCACCCTGAACGAGTGGCAACAAGCACCCAACAGCACAACCTGGACCCTGGCCAACTCACCCATCGCAAGCGGCATCGCCCAGAGCGGGCCGGCGCTCGTGCAGTCCAACGCCGGACTCAACCTCTACGACCGGCAGCAGGGCGCCTCGCGGGGCAACATCTACACCGTCGCGGTCCGCGAGGACGGCAAGCTGCAGCTCTTCTGGCGCCCCGGCGCGGACGCGGCCGGGTGGTCGGCCGGGGAGGTGTTCGGCGGCTCCGGCGTCGTGGACCCCGGCTCGCCGCCCGTCATGATTCAGGACTACTCGGGGACGGCCAACGAGACGAGCGTCGGCCGGTTCCAGCTGGCCGTCGCCGTCGGGGGGAGCGTCCAACACTGGGAGCGGGCCAACGACGCCTCGAGGCCGGGCAGGCCCCGCCCGCGGGGGCAGAAGGGGGGTCCCCGGCGGGCAGGTGGGAACTGGTCGAGACGGCGGGCACCGGGGTGAAGCGCGTCTGGGCGCTGCTCCAGGGGAGCTTTGGTGGGAGGCTGCACATGATCACGGAGGGCACGGACGGCCGGCTGTCGTACTGGGAGCGCGATGAGAAGTGGGTTGAGGTCGAGAAGCTGCCGGCGTTGAGCGACGCCGCTTGGACGAGATCGGGCCCGGTGAGTGGTGGTTGA

Polypeptide:

(SEQ ID NO: 12) MCWLWERSVAILLAAGVIANPLRPRRIPWPEPVPASSIGPIDWSSIPPSPYKHALRQTNTTTTSSSSSSSSSKYDNQVYSVQVSGSSSSPPASVDWRNRDGQNYITTPQDQGACNSCWAFAVAALIESMMRIEHGVWGKRSEADVHDGVGAACESVGNAEDTLAWVAGQGPEFVADPTRPAPGIADWACDPYEATAHAYEHCDDRSGRTTHIPYYQALGLVEDQKRWLDEYGPIIATFVLYDDFGSWKPTAAGGSGGDVYRWDGVSGSDGNHLAIVIGYDDEKQAWLMKNSWGSGWGDEGFVYFAYGEANIDNWTKYGLVNVNPDPWTRRKHQSGSMMQSGNGETHRNFELLVSEAGGSGFTHVSRDGNSTQWSKVLEVSGSGSGSGLVGQPAILGTSFNRDFHAVSLDENQVVQQWAYRQSEMRWSRVSAIEGTKIDGFPGLAQSDGSTLVMVVKHADGTLNEWQQAPNSTTWTLANSPIASGIAQSGPALVQSNAGLNLYDRQQGASRGNIYTVAVREDGKLQLFWRPGADAAGWSAGEVFGGSGVVDPGSPPVMIQDYSGTANETSVGRFQLAVAVGGSVQHWERANDDLEAGQAPPAGAEGGSPAGRWELVETAGTGVKRVWALLQGSFGGRLHMITEGTDGRLSYWERDEKWVEVEKLPALSDAAWTRSGPRQRRGQKLHIERRLDNFPETL

The wild-type M. thermophila C1 GH61a cDNA (SEQ ID NO:13) and amino acid(SEQ ID NO:14) sequences are provided below. The signal sequence isunderlined in SEQ ID NO:14. SEQ ID NO:15 provides the GH61a sequencewithout the signal sequence.

(SEQ ID NO: 13) ATGTCCAAGGCCTCTGCTCTCCTCGCTGGCCTGACGGGCGCGGCCCTCGTCGCTGCACATGGCCACGTCAGCCACATCGTCGTCAACGGCGTCTACTACAGGAACTACGACCCCACGACAGACTGGTACCAGCCCAACCCGCCAACAGTCATCGGCTGGACGGCAGCCGATCAGGATAATGGCTTCGTTGAACCCAACAGCTTTGGCACGCCAGATATCATCTGCCACAAGAGCGCCACCCCCGGCGGCGGCCACGCTACCGTTGCTGCCGGAGACAAGATCAACATCGTCTGGACCCCCGAGTGGCCCGAATCCCACATCGGCCCCGTCATTGACTACCTAGCCGCCTGCAACGGTGACTGCGAGACCGTCGACAAGTCGTCGCTGCGCTGGTTCAAGATTGACGGCGCCGGCTACGACAAGGCCGCCGGCCGCTGGGCCGCCGACGCTCTGCGCGCCAACGGCAACAGCTGGCTCGTCCAGATCCCGTCGGATCTCAAGGCCGGCAACTACGTCCTCCGCCACGAGATCATCGCCCTCCACGGTGCTCAGAGCCCCAACGGCGCCCAGGCCTACCCGCAGTGCATCAACCTCCGCGTCACCGGCGGCGGCAGCAACCTGCCCAGCGGCGTCGCCGGCACCTCGCTGTACAAGGCGACCGACCCGGGCATCCTCTTCAACCCCTACGTCTCCTCCCCGGATTACACCGTCCCCGGCCCGGCCCTCATTGCCGGCGCCGCCAGCTCGATCGCCCAGAGCACGTCGGTCGCCACTGCCACCGGCACGGCCACCGTTCCCGGCGGCGGCGGCGCCAACCCTACCGCCACCACCACCGCCGCCACCTCCGCCGCCCCGAGCACCACCCTGAGGACGACCACTACCTCGGCCGCGCAGACTACCGCCCCGCCCTCCGGCGATGTGCAGACCAAGTACGGCCAGTGTGGTGGCAACGGATGGACGGGCCCGACGGTGTGCGCCCCCGGCTCGAGCTGCTCCGTCCTCAACGAGTGGTACTCCCAGTGTTTGTAA (SEQ ID NO: 14)MSKASALLAGLTGAALVAAHGHVSHIVVNGVYYRNYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIVWTPEWPESHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLKAGNYVLRHEIIALHGAQSPNGAQAYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL (SEQ ID NO: 15)HGHVSHIVVNGVYYRNYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIVWTPEWPESHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLKAGNYVLRHEIIALHGAQSPNGAQAYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL

The cDNA sequence of a M. thermophila GH61a variant (“Variant 1”) (SEQID NO:16) and amino acid (SEQ ID NO:17) sequence are provided below. Thesignal sequence is underlined in SEQ ID NO:17. SEQ ID NO:18 provides theGH61a Variant 1 sequence without the signal sequence.

(SEQ ID NO: 16) ATGTCCAAGGCCTCTGCTCTCCTCGCTGGCCTGACGGGCGCGGCCCTCGTCGCTGCACACGGCCACGTCAGCCACATCGTCGTCAACGGCGTCTACTACAGGGGCTACGACCCCACGACAGACTGGTACCAGCCCAACCCGCCAACAGTCATCGGCTGGACGGCAGCCGATCAGGATAATGGCTTCGTTGAACCCAACAGCTTTGGCACGCCAGATATCATCTGCCACAAGAGCGCCACCCCCGGCGGCGGCCACGCTACCGTTGCTGCCGGAGACAAGATCAACATCGTCTGGACCCCCGAGTGGCCCCACTCCCACATCGGCCCCGTCATTGACTACCTAGCCGCCTGCAACGGTGACTGCGAGACCGTCGACAAGTCGTCGCTGCGCTGGTTCAAGATTGACGGCGCCGGCTACGACAAGGCCGCCGGCCGCTGGGCCGCCGACGCTCTGCGCGCCAACGGCAACAGCTGGCTCGTCCAGATCCCGTCGGATCTCAAGCCCGGCAACTACGTCCTCCGCCACGAGATCATCGCCCTCCACGGTGCTCAGAGCCCCAACGGCGCCCAGGCGTACCCGCAGTGCATCAACCTCCGCGTCACCGGCGGCGGCAGCAACCTGCCCAGCGGCGTCGCCGGCACCTCGCTGTACAAGGCGACCGACCCGGGCATCCTCTTCAACCCCTACGTCTCCTCCCCGGATTACACCGTCCCCGGCCCGGCCCTCATTGCCGGCGCCGCCAGCTCGATCGCCCAGAGCACGTCGGTCGCCACTGCCACCGGCACGGCCACCGTTCCCGGCGGCGGCGGCGCCAACCCTACCGCCACCACCACCGCCGCCACCTCCGCCGCCCCGAGCACCACCCTGAGGACGACCACTACCTCGGCCGCGCAGACTACCGCCCCGCCCTCCGGCGATGTGCAGACCAAGTACGGCCAGTGTGGTGGCAACGGATGGACGGGCCCGACGGTGTGCGCCCCCGGCTCGAGCTGCTCCGTCCTCAACGAGTGGTACTCCCAGTGTTTGTAA (SEQ ID NO: 17)MSKASALLAGLTGAALVAAHGHVSHIVVNGVYYRGYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIVWTPEWPHSHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLKPGNYVLRHEIIALHGAQSPNGAQAYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL (SEQ ID NO: 18)HGHVSHIVVNGVYYRGYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIVWTPEWPHSHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLKPGNYVLRHEIIALHGAQSPNGAQAYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL

The cDNA sequence of a M. thermophila GH61a variant (“Variant 5”) (SEQID NO:19) and amino acid (SEQ ID NO:20) sequence are provided below. Thesignal sequence is underlined in SEQ ID NO:20. SEQ ID NO:21 provides theGH61a Variant 5 sequence without the signal sequence.

(SEQ ID NO: 19) ACACAAATGTCCAAGGCCTCTGCTCTCCTCGCTGGCCTGACGGGCGCGGCCCTCGTCGCTGCACACGGCCACGTCAGCCACATCGTCGTCAACGGCGTCTACTACAGGAACTACGACCCCACGACAGACTGGTACCAGCCCAACCCGCCAACAGTCATCGGCTGGACGGCAGCCGATCAGGATAATGGCTTCGTTGAACCCAACAGCTTTGGCACGCCAGATATCATCTGCCACAAGAGCGCCACCCCCGGCGGCGGCCACGCTACCGTTGCTGCCGGAGACAAGATCAACATCGTATGGACCCCCGAGTGGCCCCACTCCCACATCGGCCCCGTCATTGACTACCTAGCCGCCTGCAACGGTGACTGCGAGACCGTCGACAAGTCGTCGCTGCGCTGGTTCAAGATTGACGGCGCCGGCTACGACAAGGCCGCCGGCCGCTGGGCCGCCGACGCTCTGCGCGCCAACGGCAACAGCTGGCTCGTCCAGATCCCGTCGGATCTCGCGGCCGGCAACTACGTCCTCCGCCACGAGATCATCGCCCTCCACGGTGCTCAGAGCCCCAACGGCGCCCAGGCGTACCCGCAGTGCATCAACCTCCGCGTCACCGGCGGCGGCAGCAACCTGCCCAGCGGCGTCGCCGGCACCTCGCTGTACAAGGCGACCGACCCGGGCATCCTCTTCAACCCCTACGTCTCCTCCCCGGATTACACCGTCCCCGGCCCGGCCCTCATTGCCGGCGCCGCCAGCTCGATCGCCCAGAGCACGTCGGTCGCCACTGCCACCGGCACGGCCACCGTTCCCGGCGGCGGCGGCGCCAACCCTACCGCCACCACCACCGCCGCCACCTCCGCCGCCCCGAGCACCACCCTGAGGACGACCACTACCTCGGCCGCGCAGACTACCGCCCCGCCCTCCGGCGATGTGCAGACCAAGTACGGCCAGTGTGGTGGCAACGGATGGACGGGCCCGACGGTGTGCGCCCCCGGCTCGAGCTGCTCCGTCCTCAACGAGTGGTACTCCCAGTGTTTGTAA (SEQ ID NO: 20)MSKASALLAGLTGAALVAAHGHVSHIVVNGVYYRNYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIVWTPEWPHSHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLAAGNYVLRHEIIALHGAQSPNGAQAYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL (SEQ ID NO: 21)HGHVSHIVVNGVYYRNYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIVWTPEWPHSHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLAAGNYVLRHEIIALHGAQSPNGAQAYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL

The cDNA sequence of a M. thermophila GH61a variant (“Variant 9”) (SEQID NO:22) and amino acid (SEQ ID NO:23) sequence are provided below. Thesignal sequence is underlined in SEQ ID NO:23. SEQ ID NO:24 providesVariant 9 sequence without the signal sequence.

(SEQ ID NO: 22) ACAAACATGTCCAAGGCCTCTGCTCTCCTCGCTGGCCTGACGGGCGCGGCCCTCGTCGCTGCACATGGCCACGTCAGCCACATCGTCGTCAACGGCGTCTACTACAGGAACTACGACCCCACGACAGACTGGTACCAGCCCAACCCGCCAACAGTCATCGGCTGGACGGCAGCCGATCAGGATAATGGCTTCGTTGAACCCAACAGCTTTGGCACGCCAGATATCATCTGCCACAAGAGCGCCACCCCCGGCGGCGGCCACGCTACCGTTGCTGCCGGAGACAAGATCAACATCCAGTGGACCCCCGAGTGGCCCGAATCCCACATCGGCCCCGTCATTGACTACCTAGCCGCCTGCAACGGTGACTGCGAGACCGTCGACAAGTCGTCGCTGCGCTGGTTCAAGATTGACGGCGCCGGCTACGACAAGGCCGCCGGCCGCTGGGCCGCCGACGCTCTGCGCGCCAACGGCAACAGCTGGCTCGTCCAGATCCCGTCGGATCTCAAGGCCGGCAACTACGTCCTCCGCCACGAGATCATCGCCCTCCACGGTGCTCAGAGCCCCAACGGCGCCCAGAACTACCCGCAGTGCATCAACCTCCGCGTCACCGGCGGCGGCAGCAACCTGCCCAGCGGCGTCGCCGGCACCTCGCTGTACAAGGCGACCGACCCGGGCATCCTCTTCAACCCCTACGTCTCCTCCCCGGATTACACCGTCCCCGGCCCGGCCCTCATTGCCGGCGCCGCCAGCTCGATCGCCCAGAGCACGTCGGTCGCCACTGCCACCGGCACGGCCACCGTTCCCGGCGGCGGCGGCGCCAACCCTACCGCCACCACCACCGCCGCCACCTCCGCCGCCCCGAGCACCACCCTGAGGACGACCACTACCTCGGCCGCGCAGACTACCGCCCCGCCCTCCGGCGATGTGCAGACCAAGTACGGCCAGTGTGGTGGCAACGGATGGACGGGCCCGACGGTGTGCGCCCCCGGCTCGAGCTGCTCCGTCCTCAACGAGTGGTACTCCCAGTGTTTGTAA (SEQ ID NO: 23)MSKASALLAGLTGAALVAAHGHVSHIVVNGVYYRNYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIQWTPEWPESHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLKAGNYVLRHEIIALHGAQSPNGAQNYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL (SEQ ID NO: 24)HGHVSHIVVNGVYYRNYDPTTDWYQPNPPTVIGWTAADQDNGFVEPNSFGTPDIICHKSATPGGGHATVAAGDKINIQWTPEWPESHIGPVIDYLAACNGDCETVDKSSLRWFKIDGAGYDKAAGRWAADALRANGNSWLVQIPSDLKAGNYVLRHEIIALHGAQSPNGAQNYPQCINLRVTGGGSNLPSGVAGTSLYKATDPGILFNPYVSSPDYTVPGPALIAGAASSIAQSTSVATATGTATVPGGGGANPTATTTAATSAAPSTTLRTTTTSAAQTTAPPSGDVQTKYGQCGGNGWTGPTVCAPGSSCSVLNEWYSQCL

The polynucleotide (SEQ ID NO:25) and amino acid (SEQ ID NO:26)sequences of an M. thermophila GH61b are provided below. The signalsequence is shown underlined in SEQ ID NO:26. SEQ ID NO:27 provides thesequence of this GH61b without the signal sequence.

(SEQ ID NO: 25) ATGAAGCTCTCCCTCTTTTCCGTCCTGGCCACTGCCCTCACCGTCGAGGGGCATGCCATCTTCCAGAAGGTCTCCGTCAACGGAGCGGACCAGGGCTCCCTCACCGGCCTCCGCGCTCCCAACAACAACAACCCCGTGCAGAATGTCAACAGCCAGGACATGATCTGCGGCCAGTCGGGATCGACGTCGAACACTATCATCGAGGTCAAGGCCGGCGATAGGATCGGTGCCTGGTATCAGCATGTCATCGGCGGTGCCCAGTTCCCCAACGACCCAGACAACCCGATTGCCAAGTCGCACAAGGGCCCCGTCATGGCCTACCTCGCCAAGGTTGACAATGCCGCAACCGCCAGCAAGACGGGCCTGAAGTGGTTCAAGATTTGGGAGGATACCTTTAATCCCAGCACCAAGACCTGGGGTGTCGACAACCTCATCAACAACAACGGCTGGGTGTACTTCAACCTCCCGCAGTGCATCGCCGACGGCAACTACCTCCTCCGCGTCGAGGTCCTCGCTCTGCACTCGGCCTACTCCCAGGGCCAGGCTCAGTTCTACCAGTCCTGCGCCCAGATCAACGTATCCGGCGGCGGCTCCTTCACGCCGGCGTCGACTGTCAGCTTCCCGGGTGCCTACAGCGCCAGCGACCCCGGTATCCTGATCAACATCTACGGCGCCACCGGCCAGCCCGACAACAACGGCCAGCCGTACACTGCCCCTGGGCCCGCGCCCATCTCCTGC (SEQ ID NO: 26)MKLSLFSVLATALTVEGHAIFQKVSVNGADQGSLTGLRAPNNNNPVQNVNSQDMICGQSGSTSNTIIEVKAGDRIGAWYQHVIGGAQFPNDPDNPIAKSHKGPVMAYLAKVDNAATASKTGLKWFKIWEDTFNPSTKTWGVDNLINNNGWVYFNLPQCIADGNYLLRVEVLALHSAYSQGQAQFYQSCAQINVSGGGSFTPASTVSFPGAYSASDPGILINIYGATGQPDNNGQPYTAPGPAPISC (SEQ ID NO: 27)IFQKVSVNGADQGSLTGLRAPNNNNPVQNVNSQDMICGQSGSTSNTIIEVKAGDRIGAWYQHVIGGAQFPNDPDNPIAKSHKGPVMAYLAKVDNAATASKTGLKWFKIWEDTFNPSTKTWGVDNLINNNGWVYFNLPQCIADGNYLLRVEVLALHSAYSQGQAQFYQSCAQINVSGGGSFTPASTVSFPGAYSASDPGILINIYGATGQPDNNGQPYTAPGPAPISC

The polynucleotide (SEQ ID NO:28) and amino acid (SEQ ID NO:29)sequences of an M. thermophila GH61c are provided below. The signalsequence is shown underlined in SEQ ID NO:29. SEQ ID NO:30 provides thesequence of this GH61c without the signal sequence.

(SEQ ID NO: 28) ATGGCCCTCCAGCTCTTGGCGAGCTTGGCCCTCCTCTCAGTGCCGGCCCTTGCCCACGGTGGCTTGGCCAACTACACCGTCGGTGATACTTGGTACAGAGGCTACGACCCAAACCTGCCGCCGGAGACGCAGCTCAACCAGACCTGGATGATCCAGCGGCAATGGGCCACCATCGACCCCGTCTTCACCGTGTCGGAGCCGTACCTGGCCTGCAACAACCCGGGCGCGCCGCCGCCCTCGTACATCCCCATCCGCGCCGGTGACAAGATCACGGCCGTGTACTGGTACTGGCTGCACGCCATCGGGCCCATGAGCGTCTGGCTCGCGCGGTGCGGCGACACGCCCGCGGCCGACTGCCGCGACGTCGACGTCAACCGGGTCGGCTGGTTCAAGATCTGGGAGGGCGGCCTGCTGGAGGGTCCCAACCTGGCCGAGGGGCTCTGGTACCAAAAGGACTTCCAGCGCTGGGACGGCTCCCCGTCCCTCTGGCCCGTCACGATCCCCAAGGGGCTCAAGAGCGGGACCTACATCATCCGGCACGAGATCCTGTCGCTTCACGTCGCCCTCAAGCCCCAGTTTTACCCGGAGTGTGCGCATCTGAATATTACTGGGGGCGGAGACTTGCTGCCACCCGAAGAGACTCTGGTGCGGTTTCCGGGGGTTTACAAAGAGGACGATCCCTCTATCTTCATCGATGTCTACTCGGAGGAGAACGCGAACCGGACAGATTATACGGTTCCGGGAGGGCCA ATCTGGGAAGGG (SEQ IDNO: 29) MALQLLASLALLSVPALAHGGLANYTVGDTWYRGYDPNLPPETQLNQTWMIQRQWATIDPVFTVSEPYLACNNPGAPPPSYIPIRAGDKITAVYWYWLHAIGPMSVWLARCGDTPAADCRDVDVNRVGWFKIWEGGLLEGPNLAEGLWYQKDFQRWDGSPSLWPVTIPKGLKSGTYIIRHEILSLHVALKPQFYPECAHLNITGGGDLLPPEETLVRFPGVYKEDDPSIFIDVYSEENANRTDYTVPGGP IWEG (SEQ ID NO: 30)NYTVGDTWYRGYDPNLPPETQLNQTWMIQRQWATIDPVFTVSEPYLACNNPGAPPPSYIPIRAGDKITAVYWYWLHAIGPMSVWLARCGDTPAADCRDVDVNRVGWFKIWEGGLLEGPNLAEGLWYQKDFQRWDGSPSLWPVTIPKGLKSGTYIIRHEILSLHVALKPQFYPECAHLNITGGGDLLPPEETLVRFPGVYKEDDPSIFIDVYSEENANRTDYTVPGGPIWEG

The polynucleotide (SEQ ID NO:31) and amino acid (SEQ ID NO:32)sequences of an M. thermophila GH61d are provided below. The signalsequence is shown underlined in SEQ ID NO:32. SEQ ID NO:33 provides thesequence of this GH61d without the signal sequence.

(SEQ ID NO: 31) ATGAAGGCCCTCTCTCTCCTTGCGGCTGCCGGGGCAGTCTCTGCGCATACCATCTTCGTCCAGCTCGAAGCAGACGGCACGAGGTACCCGGTTTCGTACGGGATCCGGGACCCAACCTACGACGGCCCCATCACCGACGTCACATCCAACGACGTTGCTTGCAACGGCGGTCCGAACCCGACGACCCCCTCCAGCGACGTCATCACCGTCACCGCGGGCACCACCGTCAAGGCCATCTGGAGGCACACCCTCCAATCCGGCCCGGACGATGTCATGGACGCCAGCCACAAGGGCCCGACCCTGGCCTACATCAAGAAGGTCGGCGATGCCACCAAGGACTCGGGCGTCGGCGGTGGCTGGTTCAAGATCCAGGAGGACGGTTACAACAACGGCCAGTGGGGCACCAGCACCGTTATCTCCAACGGCGGCGAGCACTACATTGACATCCCGGCCTGCATCCCCGAGGGTCAGTACCTCCTCCGCGCCGAGATGATCGCCCTCCACGCGGCCGGGTCCCCCGGCGGCGCTCAGCTCTACATGGAATGTGCCCAGATCAACATCGTCGGCGGCTCCGGCTCGGTGCCCAGCTCGACGGTCAGCTTCCCCGGCGCGTATAGCCCCAACGACCCGGGTCTCCTCATCAACATCTATTCCATGTCGCCCTCGAGCTCGTACACCATCCCGGGCCCGCCCGTTTTCA AGTGC (SEQ ID NO: 32)MKALSLLAAAGAVSAHTIFVQLEADGTRYPVSYGIRDPTYDGPITDVTSNDVACNGGPNPTTPSSDVITVTAGTTVKAIWRHTLQSGPDDVMDASHKGPTLAYIKKVGDATKDSGVGGGWFKIQEDGYNNGQWGTSTVISNGGEHYIDIPACIPEGQYLLRAEMIALHAAGSPGGAQLYMECAQINIVGGSGSVPSSTVSFPGAYSPNDPGLLINIYSMSPSSSYTIPGPPVFKC (SEQ ID NO: 33)HTIFVQLEADGTRYPVSYGIRDPTYDGPITDVTSNDVACNGGPNPTTPSSDVITVTAGTTVKAIWRHTLQSGPDDVMDASHKGPTLAYIKKVGDATKDSGVGGGWFKIQEDGYNNGQWGTSTVISNGGEHYIDIPACIPEGQYLLRAEMIALHAAGSPGGAQLYMECAQINIVGGSGSVPSSTVSFPGAYSPNDPGLLIN IYSMSPSSSYTIPGPPVFKC

The polynucleotide (SEQ ID NO:34) and amino acid (SEQ ID NO:35)sequences of an M. thermophila GH61e are provided below. The signalsequence is shown underlined in SEQ ID NO:35. SEQ ID NO:36 provides thesequence of this GH61d without the signal sequence.

(SEQ ID NO: 34) ATGAAGTCGTCTACCCCGGCCTTGTTCGCCGCTGGGCTCCTTGCTCAGCATGCTGCGGCCCACTCCATCTTCCAGCAGGCGAGCAGCGGCTCGACCGACTTTGATACGCTGTGCACCCGGATGCCGCCCAACAATAGCCCCGTCACTAGTGTGACCAGCGGCGACATGACCTGCAAAGTCGGCGGCACCAAGGGGGTGTCCGGCTTCTGCGAGGTGAACGCCGGCGACGAGTTCACGGTTGAGATGCACGCGCAGCCCGGCGACCGCTCGTGCGCCAACGAGGCCATCGGCGGGAACCACTTCGGCCCGGTCCTCATCTACATGAGCAAGGTCGACGACGCCTCCACCGCCGACGGGTCCGGCGACTGGTTCAAGGTGGACGAGTTCGGCTACGACGCAAGCACCAAGACCTGGGGCACCGACAAGCTCAACGAGAACTGCGGCAAGCGCACCTTCAACATCCCCAGCCACATCCCCGCGGGCGACTATCTCGTCCGGGCCGAGGCTATCGCGCTACACACTGCCAACCAGCCAGGCGGCGCGCAGTTCTACATGAGCTGCTATCAAGTCAGGATTTCCGGCGGCGAAGGGGGCCAGCTGCCTGCCGGAGTCAAGATCCCGGGCGCGTACAGTGCCAACGACCCCGGCATCCTTGTCGACATCTGGGGTAACGATTTCAACGACCCTCCAGGACACTCGGCCCGTCACGCCATCATCATCATCAGCAGCAGCAGCAACAACAGCGGCGCCAAGATGACCAAGAAGATCCAGGAGCCCACCATCACATCGGTCACGGACCTCCCCACCGACGAGGCCAAGTGGATCGCGCTCCAAAAGATCTCGTACGTGGACCAGACGGGCACGGCGCGGACATACGAGCCGGCGTCGCGCAAGACGCGG TCGCCAAGAGTCTAG (SEQID NO: 35) MKSSTPALFAAGLLAQHAAAHSIFQQASSGSTDFDTLCTRMPPNNSPVTSVTSGDMTCKVGGTKGVSGFCEVNAGDEFTVEMHAQPGDRSCANEAIGGNHFGPVLIYMSKVDDASTADGSGDWFKVDEFGYDASTKTWGTDKLNENCGKRTFNIPSHIPAGDYLVRAEAIALHTANQPGGAQFYMSCYQVRISGGEGGQLPAGVKIPGAYSANDPGILVDIWGNDFNDPPGHSARHAIIIISSSSNNSGAKMTKKIQEPTITSVTDLPTDEAKWIALQKISYVDQTGTARTYEPASRKTR SPRV (SEQ ID NO: 36)HSIFQQASSGSTDFDTLCTRMPPNNSPVTSVTSGDMTCKVGGTKGVSGFCEVNAGDEFTVEMHAQPGDRSCANEAIGGNHFGPVLIYMSKVDDASTADGSGDWFKVDEFGYDASTKTWGTDKLNENCGKRTFNIPSHIPAGDYLVRAEAIALHTANQPGGAQFYMSCYQVRISGGEGGQLPAGVKIPGAYSANDPGILVDIWGNDFNDPPGHSARHAIIIISSSSNNSGAKMTKKIQEPTITSVTDLPTDEAKWIALQKISYVDQTGTARTYEPASRKTRSPRV

The polynucleotide (SEQ ID NO:37) and amino acid (SEQ ID NO:38)sequences of an alternative M. thermophila GH61e are provided below. Thesignal sequence is shown underlined in SEQ ID NO:38. SEQ ID NO:39provides the sequence of this GH61e without the signal sequence.

(SEQ ID NO: 37) ATGAAGTCGTCTACCCCGGCCTTGTTCGCCGCTGGGCTCCTTGCTCAGCATGCTGCGGCCCACTCCATCTTCCAGCAGGCGAGCAGCGGCTCGACCGACTTTGATACGCTGTGCACCCGGATGCCGCCCAACAATAGCCCCGTCACTAGTGTGACCAGCGGCGACATGACCTGCAACGTCGGCGGCACCAAGGGGGTGTCGGGCTTCTGCGAGGTGAACGCCGGCGACGAGTTCACGGTTGAGATGCACGCGCAGCCCGGCGACCGCTCGTGCGCCAACGAGGCCATCGGCGGGAACCACTTCGGCCCGGTCCTCATCTACATGAGCAAGGTCGACGACGCCTCCACTGCCGACGGGTCCGGCGACTGGTTCAAGGTGGACGAGTTCGGCTACGACGCAAGCACCAAGACCTGGGGCACCGACAAGCTCAACGAGAACTGCGGCAAGCGCACCTTCAACATCCCCAGCCACATCCCCGCGGGCGACTATCTCGTCCGGGCCGAGGCTATCGCGCTACACACTGCCAACCAGCCAGGCGGCGCGCAGTTCTACATGAGCTGCTATCAAGTCAGGATTTCCGGCGGCGAAGGGGGCCAGCTGCCTGCCGGAGTCAAGATCCCGGGCGCGTACAGTGCCAACGACCCCGGCATCCTTGTCGACATCTGGGGTAACGATTTCAACGAGTACGTTATTCCGGGCCCCCCGGTCATCGACAGCAGCTACTTC (SEQ ID NO: 38)MKSSTPALFAAGLLAQHAAAHSIFQQASSGSTDFDTLCTRMPPNNSPVTSVTSGDMTCNVGGTKGVSGFCEVNAGDEFTVEMHAQPGDRSCANEAIGGNHFGPVLIYMSKVDDASTADGSGDWFKVDEFGYDASTKTWGTDKLNENCGKRTFNIPSHIPAGDYLVRAEAIALHTANQPGGAQFYMSCYQVRISGGEGGQLPAGVKIPGAYSANDPGILVDIWGNDFNEYVIPGPPVIDSSYF (SEQ ID NO: 39)HSIFQQASSGSTDFDTLCTRMPPNNSPVTSVTSGDMTCNVGGTKGVSGFCEVNAGDEFTVEMHAQPGDRSCANEAIGGNHFGPVLIYMSKVDDASTADGSGDWFKVDEFGYDASTKTWGTDKLNENCGKRTFNIPSHIPAGDYLVRAEAIALHTANQPGGAQFYMSCYQVRISGGEGGQLPAGVKIPGAYSANDPGILVDIWGNDFNEYVIPGPPVIDSSYF

The polynucleotide (SEQ ID NO:40) and amino acid (SEQ ID NO:41)sequences of a M. thermophila GH61f are provided below. The signalsequence is shown underlined in SEQ ID NO:41. SEQ ID NO:42 provides thesequence of this GH61f without the signal sequence.

(SEQ ID NO: 40) ATGAAGTCCTTCACCCTCACCACTCTGGCCGCCCTGGCTGGCAACGCCGCCGCTCACGCGACCTTCCAGGCCCTCTGGGTCGACGGCGTCGACTACGGCGCGCAGTGTGCCCGTCTGCCCGCGTCCAACTCGCCGGTCACCGACGTGACCTCCAACGCGATCCGCTGCAACGCCAACCCCTCGCCCGCTCGGGGCAAGTGCCCGGTCAAGGCCGGCTCGACCGTTACGGTCGAGATGCATCAGCAACCCGGTGACCGCTCGTGCAGCAGCGAGGCGATCGGCGGGGCGCACTACGGCCCCGTGATGGTGTACATGTCCAAGGTGTCGGACGCGGCGTCGGCGGACGGGTCGTCGGGCTGGTTCAAGGTGTTCGAGGACGGCTGGGCCAAGAACCCGTCCGGCGGGTCGGGCGACGACGACTACTGGGGCACCAAGGACCTGAACTCGTGCTGCGGGAAGATGAACGTCAAGATCCCCGCCGACCTGCCCTCGGGCGACTACCTGCTCCGGGCCGAGGCCCTCGCGCTGCACACGGCCGGCAGCGCGGGCGGCGCCCAGTTCTACATGACCTGCTACCAGCTCACCGTGACCGGCTCCGGCAGCGCCAGCCCGCCCACCGTCTCCTTCCCGGGCGCCTACAAGGCCACCGACCCGGGCATCCTCGTCAACATCCACGCCCCGCTGTCCGGCTACACCGTGCCCGGCCCGGCCGTCTACTCGGGCGGCTCCACCAAGAAGGCCGGCAGCGCCTGCACCGGCTGCGAGTCCACTTGCGCCGTCGGCTCCGGCCCCACCGCCACCGTCTCCCAGTCGCCCGGTTCCACCGCCACCTCGGCCCCCGGCGGCGGCGGCGGCTGCACCGTCCAGAAGTACCAGCAGTGCGGCGGCCAGGGCTACACCGGCTGCACCAACTGCGCGTCCGGCTCCACCTGCAGCGCGGTCTCGCCGCC CTACTACTCGCAGTGCGTC(SEQ ID NO: 41) MKSFTLTTLAALAGNAAAHATFQALWVDGVDYGAQCARLPASNSPVTDVTSNAIRCNANPSPARGKCPVKAGSTVTVEMHQQPGDRSCSSEAIGGAHYGPVMVYMSKVSDAASADGSSGWFKVFEDGWAKNPSGGSGDDDYWGTKDLNSCCGKMNVKIPADLPSGDYLLRAEALALHTAGSAGGAQFYMTCYQLTVTGSGSASPPTVSFPGAYKATDPGILVNIHAPLSGYTVPGPAVYSGGSTKKAGSACTGCESTCAVGSGPTATVSQSPGSTATSAPGGGGGCTVQKYQQCGGQGYTGCTNCASGSTCSAVSPPYYSQCV (SEQ ID NO: 42)HATFQALWVDGVDYGAQCARLPASNSPVTDVTSNAIRCNANPSPARGKCPVKAGSTVTVEMHQQPGDRSCSSEAIGGAHYGPVMVYMSKVSDAASADGSSGWFKVFEDGWAKNPSGGSGDDDYWGTKDLNSCCGKMNVKIPADLPSGDYLLRAEALALHTAGSAGGAQFYMTCYQLTVTGSGSASPPTVSFPGAYKATDPGILVNIHAPLSGYTVPGPAVYSGGSTKKAGSACTGCESTCAVGSGPTATVSQSPGSTATSAPGGGGGCTVQKYQQCGGQGYTGCTNCASGSTCSAVSPPY YSQCV

The polynucleotide (SEQ ID NO:43) and amino acid (SEQ ID NO:44)sequences of an M. thermophila GH61g are provided below. The signalsequence is shown underlined in SEQ ID NO:44. SEQ ID NO:45 provides thesequence of this GH61g without the signal sequence.

(SEQ ID NO: 43) ATGAAGGGACTCCTCGGCGCCGCCGCCCTCTCGCTGGCCGTCAGCGATGTCTCGGCCCACTACATCTTTCAGCAGCTGACGACGGGCGGCGTCAAGCACGCTGTGTACCAGTACATCCGCAAGAACACCAACTATAACTCGCCCGTGACCGATCTGACGTCCAACGACCTCCGCTGCAATGTGGGTGCTACCGGTGCGGGCACCGATACCGTCACGGTGCGCGCCGGCGATTCGTTCACCTTCACGACCGATACGCCCGTTTACCACCAGGGCCCGACCTCGATCTACATGTCCAAGGCCCCCGGCAGCGCGTCCGACTACGACGGCAGCGGCGGCTGGTTCAAGATCAAGGACTGGGCTGACTACACCGCCACGATTCCGGAATGTATTCCCCCCGGCGACTACCTGCTTCGCATCCAGCAACTCGGCATCCACAACCCTTGGCCCGCGGGCATCCCCCAGTTCTACATCTCTTGTGCCCAGATCACCGTGACTGGTGGCGGCAGTGCCAACCCCGGCCCGACCGTCTCCATCCCAGGCGCCTTCAAGGAGACCGACCCGGGCTACACTGTCAACATCTACAACAACTTCCACAACTACACCGTCCCTGGCCCAGCCGTCTTCACCTGCAACGGTAGCGGCGGCAACAACGGCGGCGGCTCCAACCCAGTCACCACCACCACCACCACCACCACCAGGCCGTCCACCAGCACCGCCCAGTCCCAGCCGTCGTCGAGCCCGACCAGCCCCTCCAGCTGCACCGTCGCGAAGTGGGGCCAGTGCGGAGGACAGGGTTACAGCGGCTGCACCGTGTGCGCGGCCGGGTCGACCTGCCAGAAGACCAACGACT ACTACAGCCAGTGCTTGTAG(SEQ ID NO: 44) MKGLLGAAALSLAVSDVSAHYIFQQLTTGGVKHAVYQYIRKNTNYNSPVTDLTSNDLRCNVGATGAGTDTVTVRAGDSFTFTTDTPVYHQGPTSIYMSKAPGSASDYDGSGGWFKIKDWADYTATIPECIPPGDYLLRIQQLGIHNPWPAGIPQFYISCAQITVTGGGSANPGPTVSIPGAFKETDPGYTVNIYNNFHNYTVPGPAVFTCNGSGGNNGGGSNPVTTTTTTTTRPSTSTAQSQPSSSPTSPSSCTVAKWGQCGGQGYSGCTVCAAGSTCQKTNDYYSQCL (SEQ ID NO: 45)HYIFQQLTTGGVKHAVYQYIRKNTNYNSPVTDLTSNDLRCNVGATGAGTDTVTVRAGDSFTFTTDTPVYHQGPTSIYMSKAPGSASDYDGSGGWFKIKDWADYTATIPECIPPGDYLLRIQQLGIHNPWPAGIPQFYISCAQITVTGGGSANPGPTVSIPGAFKETDPGYTVNIYNNFHNYTVPGPAVFTCNGSGGNNGGGSNPVTTTTTTTTRPSTSTAQSQPSSSPTSPSSCTVAKWGQCGGQGYSGC TVCAAGSTCQKTNDYYSQCL

The polynucleotide (SEQ ID NO:46) and amino acid (SEQ ID NO:47)sequences of an alternative M. thermophila GH61g are provided below. Thesignal sequence is shown underlined in SEQ ID NO:47. SEQ ID NO:48provides the sequence of this GH61g without the signal sequence.

(SEQ ID NO: 46) CTGACGACGGGCGGCGTCAAGCACGCTGTGTACCAGTACATCCGCAAGAACACCAACTATAACTCGCCCGTGACCGATCTGACGTCCAACGACCTCCGCTGCAATGTGGGTGCTACCGGTGCGGGCACCGATACCGTCACGGTGCGCGCCGGCGATTCGTTCACCTTCACGACCGATACGCCCGTTTACCACCAGGGCCCGACCTCGATCTACATGTCCAAGGCCCCCGGCAGCGCGTCCGACTACGACGGCAGCGGCGGCTGGTTCAAGATCAAGGACTGGGGTGCCGACTTTAGCAGCGGCCAGGCCACCTGGACCTTGGCGTCTGACTACACCGCCACGATTCCGGAATGTATTCCCCCCGGCGACTACCTGCTTCGCATCCAGCAACTCGGCATCCACAACCCTTGGCCCGCGGGCATCCCCCAGTTCTACATCTCTTGTGCCCAGATCACCGTGACTGGTGGCGGCAGTGCCAACCCCGGCCCGACCGTCTCCATCCCAGGCGCCTTCAAGGAGACCGACCCGGGCTACACTGTCAACATCTACAACAACTTCCACAACTACACCGTCCCTGGCCCAGCCGTCTTCACCTGCAACGGTAGCGGCGGCAACAACGGCGGCGGCTCCAACCCAGTCACCACCACCACCACCACCACCACCAGGCCGTCCACCAGCACCGCCCAGTCCCAGCCGTCGTCGAGCCCGACCAGCCCCTCCAGCTGCACCGTCGCGAAGTGGGGCCAGTGCGGAGGACAGGGTTACAGCGGCTGCACCGTGTGCGCGGCCGGGTCGACCTGCCAGAAGACCAACGACTACTACAGCCAGTGCTTG (SEQ ID NO: 47)MKGLLGAAALSLAVSDVSAHYIFQQLTTGGVKHAVYQYIRKNTNYNSPVTDLTSNDLRCNVGATGAGTDTVTVRAGDSFTFTTDTPVYHQGPTSIYMSKAPGSASDYDGSGGWFKIKDWGADFSSGQATWTLASDYTATIPECIPPGDYLLRIQQLGIHNPWPAGIPQFYISCAQITVTGGGSANPGPTVSIPGAFKETDPGYTVNIYNNFHNYTVPGPAVFTCNGSGGNNGGGSNPVTTTTTTTTRPSTSTAQSQPSSSPTSPSSCTVAKWGQCGGQGYSGCTVCAAGSTCQKTNDYYS QCL (SEQ ID NO: 48)HYIFQQLTTGGVKHAVYQYIRKNTNYNSPVTDLTSNDLRCNVGATGAGTDTVTVRAGDSFTFTTDTPVYHQGPTSIYMSKAPGSASDYDGSGGWFKIKDWGADFSSGQATWTLASDYTATIPECIPPGDYLLRIQQLGIHNPWPAGIPQFYISCAQITVTGGGSANPGPTVSIPGAFKETDPGYTVNIYNNFHNYTVPGPAVFTCNGSGGNNGGGSNPVTTTTTTTTRPSTSTAQSQPSSSPTSPSSCTVAKWGQCGGQGYSGCTVCAAGSTCQKTNDYYSQCL

The polynucleotide (SEQ ID NO:49) and amino acid (SEQ ID NO:50)sequences of an M. thermophila GH61h are provided below. The signalsequence is shown underlined in SEQ ID NO:50. SEQ ID NO:51 provides thesequence of this GH61h without the signal sequence.

(SEQ ID NO: 49) ATGTCTTCCTTCACCTCCAAGGGTCTCCTTTCCGCCCTCATGGGCGCGGCAACGGTTGCCGCCCACGGTCACGTCACCAACATCGTCATCAACGGCGTCTCATACCAGAACTTCGACCCATTCACGCACCCTTATATGCAGAACCCTCCGACGGTTGTCGGCTGGACCGCGAGCAACACGGACAACGGCTTCGTCGGCCCCGAGTCCTTCTCTAGCCCGGACATCATCTGCCACAAGTCCGCCACCAACGCTGGCGGCCATGCCGTCGTCGCGGCCGGCGATAAGGTCTTCATCCAGTGGGACACCTGGCCCGAGTCGCACCACGGTCCGGTCATCGACTATCTCGCCGACTGCGGCGACGCGGGCTGCGAGAAGGTCGACAAGACCACGCTCAAGTTCTTCAAGATCAGCGAGTCCGGCCTGCTCGACGGCACTAACGCCCCCGGCAAGTGGGCGTCCGACACGCTGATCGCCAACAACAACTCGTGGCTGGTCCAGATCCCGCCCAACATCGCCCCGGGCAACTACGTCCTGCGCCACGAGATCATCGCCCTGCACAGCGCCGGCCAGCAGAACGGCGCCCAGAACTACCCTCAGTGCTTCAACCTGCAGGTCACCGGCTCCGGCACTCAGAAGCCCTCCGGCGTCCTCGGCACCGAGCTCTACAAGGCCACCGACGCCGGCATCCTGGCCAACATCTACACCTCGCCCGTCACCTACCAGATCCCCGGCCCGGCCATCATCTCGGGCGCCTCCGCCGTCCAGCAGACCACCTCGGCCATCACCGCCTCTGCTAGCGCCATCACCGGCTCCGCTACCGCCGCGCCCACGGCTGCCACCACCACCGCCGCCGCCGCCGCCACCACTACCACCACCGCTGGCTCCGGTGCTACCGCCACGCCCTCGACCGGCGGCTCTCCTTCTTCCGCCCAGCCTGCTCCTACCACCGCTGCCGCTACCTCCAGCCCTGCTCGCCCGACCCGCTGCGCTGGTCTGAAGAAGCGCCGTCGCCACGCCCGTGACGTCAAGGTTGCCCTC (SEQ ID NO: 50)MSSFTSKGLLSALMGAATVAAHGHVTNIVINGVSYQNFDPFTHPYMQNPPTVVGWTASNTDNGFVGPESFSSPDIICHKSATNAGGHAVVAAGDKVFIQWDTWPESHHGPVIDYLADCGDAGCEKVDKTTLKFFKISESGLLDGTNAPGKWASDTLIANNNSWLVQIPPNIAPGNYVLRHEIIALHSAGQQNGAQNYPQCFNLQVTGSGTQKPSGVLGTELYKATDAGILANIYTSPVTYQIPGPAIISGASAVQQTTSAITASASAITGSATAAPTAATTTAAAAATTTTTAGSGATATPSTGGSPSSAQPAPTTAAATSSPARPTRCAGLKKRRRHARDVKVAL (SEQ ID NO: 51)AHGHVTNIVINGVSYQNFDPFTHPYMQNPPTVVGWTASNTDNGFVGPESFSSPDIICHKSATNAGGHAVVAAGDKVFIQWDTWPESHHGPVIDYLADCGDAGCEKVDKTTLKFFKISESGLLDGTNAPGKWASDTLIANNNSWLVQIPPNIAPGNYVLRHEIIALHSAGQQNGAQNYPQCFNLQVTGSGTQKPSGVLGTELYKATDAGILANIYTSPVTYQIPGPAIISGASAVQQTTSAITASASAITGSATAAPTAATTTAAAAATTTTTAGSGATATPSTGGSPSSAQPAPTTAAATSSPARPTRCAGLKKRRRHARDVKVAL

The polynucleotide (SEQ ID NO:52) and amino acid (SEQ ID NO:53)sequences of an M. thermophila GH61i are provided below. The signalsequence is shown underlined in SEQ ID NO:53. SEQ ID NO:54 provides thesequence of this GH61i without the signal sequence.

(SEQ ID NO: 52) ATGAAGACGCTCGCCGCCCTCGTGGTCTCGGCCGCCCTCGTGGCCGCGCACGGCTATGTTGACCACGCCACGATCGGTGGCAAGGATTATCAGTTCTACCAGCCGTACCAGGACCCTTACATGGGCGACAACAAGCCCGATAGGGTTTCCCGCTCCATCCCGGGCAACGGCCCCGTGGAGGACGTCAACTCCATCGACCTCCAGTGCCACGCCGGTGCCGAACCGGCCAAGCTCCACGCCCCCGCCGCCGCCGGCTCGACCGTGACGCTCTACTGGACCCTCTGGCCCGACTCCCACGTCGGCCCCGTCATCACCTACATGGCTCGCTGCCCCGACACCGGCTGCCAGGACTGGTCCCCGGGAACTAAGCCCGTTTGGTTCAAGATCAAGGAAGGCGGCCGTGAGGGCACCTCCAATACCCCGCTCATGACGGCCCCCTCCGCCTACACCTACACGATCCCGTCCTGCCTCAAGAGCGGCTACTACCTCGTCCGCCACGAGATCATCGCCCTGCACTCGGCCTGGCAGTACCCCGGCGCCCAGTTCTACCCGGGCTGCCACCAGCTCCAGGTCACCGGCGGCGGCTCCACCGTGCCCTCTACCAACCTGGTCTCCTTCCCCGGCGCCTACAAGGGGAGCGACCCCGGCATCACCTACGACGCTTACAAGGCGCAACCTTACACCATCCCTGGCCCGGCCG TGTTTACCTGCTGA (SEQID NO: 53) MKTLAALVVSAALVAAHGYVDHATIGGKDYQFYQPYQDPYMGDNKPDRVSRSIPGNGPVEDVNSIDLQCHAGAEPAKLHAPAAAGSTVTLYWTLWPDSHVGPVITYMARCPDTGCQDWSPGTKPVWFKIKEGGREGTSNTPLMTAPSAYTYTIPSCLKSGYYLVRHEIIALHSAWQYPGAQFYPGCHQLQVTGGGSTVPSTNLVSFPGAYKGSDPGITYDAYKAQPYTIPGPAVFTC (SEQ ID NO: 54)YVDHATIGGKDYQFYQPYQDPYMGDNKPDRVSRSIPGNGPVEDVNSIDLQCHAGAEPAKLHAPAAAGSTVTLYWTLWPDSHVGPVITYMARCPDTGCQDWSPGTKPVWFKIKEGGREGTSNTPLMTAPSAYTYTIPSCLKSGYYLVRHEIIALHSAWQYPGAQFYPGCHQLQVTGGGSTVPSTNLVSFPGAYKGSDPGIT YDAYKAQPYTIPGPAVFTC

The polynucleotide (SEQ ID NO:55) and amino acid (SEQ ID NO:56)sequences of an alternative M. thermophila GH61i are provided below. Thesignal sequence is shown underlined in SEQ ID NO:56. SEQ ID NO:57provides the sequence of this GH61i without the signal sequence.

(SEQ ID NO: 55) ATGAAGACGCTCGCCGCCCTCGTGGTCTCGGCCGCCCTCGTGGCCGCGCACGGCTATGTTGACCACGCCACGATCGGTGGCAAGGATTATCAGTTCTACCAGCCGTACCAGGACCCTTACATGGGCGACAACAAGCCCGATAGGGTTTCCCGCTCCATCCCGGGCAACGGCCCCGTGGAGGACGTCAACTCCATCGACCTCCAGTGCCACGCCGGTGCCGAACCGGCCAAGCTCCACGCCCCCGCCGCCGCCGGCTCGACCGTGACGCTCTACTGGACCCTCTGGCCCGACTCCCACGTCGGCCCCGTCATCACCTACATGGCTCGCTGCCCCGACACCGGCTGCCAGGACTGGTCCCCGGGAACTAAGCCCGTTTGGTTCAAGATCAAGGAAGGCGGCCGTGAGGGCACCTCCAATGTCTGGGCTGCTACCCCGCTCATGACGGCCCCCTCCGCCTACACCTACACGATCCCGTCCTGCCTCAAGAGCGGCTACTACCTCGTCCGCCACGAGATCATCGCCCTGCACTCGGCCTGGCAGTACCCCGGCGCCCAGTTCTACCCGGGCTGCCACCAGCTCCAGGTCACCGGCGGCGGCTCCACCGTGCCCTCTACCAACCTGGTCTCCTTCCCCGGCGCCTACAAGGGGAGCGACCCCGGCATCACCTACGACGCTTACAAGGCGCAACCTTACACCATCCCTGGCCCGGCCGTGTTTACCTGC (SEQ ID NO: 56)MKTLAALVVSAALVAAHGYVDHATIGGKDYQFYQPYQDPYMGDNKPDRVSRSIPGNGPVEDVNSIDLQCHAGAEPAKLHAPAAAGSTVTLYWTLWPDSHVGPVITYMARCPDTGCQDWSPGTKPVWFKIKEGGREGTSNVWAATPLMTAPSAYTYTIPSCLKSGYYLVRHEIIALHSAWQYPGAQFYPGCHQLQVTGGGSTVPSTNLVSFPGAYKGSDPGITYDAYKAQPYTIPGPAVFTC (SEQ ID NO: 57)YVDHATIGGKDYQFYQPYQDPYMGDNKPDRVSRSIPGNGPVEDVNSIDLQCHAGAEPAKLHAPAAAGSTVTLYWTLWPDSHVGPVITYMARCPDTGCQDWSPGTKPVWFKIKEGGREGTSNVWAATPLMTAPSAYTYTIPSCLKSGYYLVRHEIIALHSAWQYPGAQFYPGCHQLQVTGGGSTVPSTNLVSFPGAYKGSDPGITYDAYKAQPYTIPGPAVFTC

The polynucleotide (SEQ ID NO:58) and amino acid (SEQ ID NO:59)sequences of an M. thermophila GH61j are provided below. The signalsequence is shown underlined in SEQ ID NO:59. SEQ ID NO:60 provides thesequence of this GH61j without the signal sequence.

(SEQ ID NO: 58) ATGAGATACTTCCTCCAGCTCGCTGCGGCCGCGGCCTTTGCCGTGAACAGCGCGGCGGGTCACTACATCTTCCAGCAGTTCGCGACGGGCGGGTCCAAGTACCCGCCCTGGAAGTACATCCGGCGCAACACCAACCCGGACTGGCTGCAGAACGGGCCGGTGACGGACCTGTCGTCGACCGACCTGCGCTGCAACGTGGGCGGGCAGGTCAGCAACGGGACCGAGACCATCACCTTGAACGCCGGCGACGAGTTCAGCTTCATCCTCGACACGCCCGTCTACCATGCCGGCCCCACCTCGCTCTACATGTCCAAGGCGCCCGGAGCTGTGGCCGACTACGACGGCGGCGGGGCCTGGTTCAAGATCTACGACTGGGGTCCGTCGGGGACGAGCTGGACGTTGAGTGGCACGTACACTCAGAGAATTCCCAAGTGCATCCCTGACGGCGAGTACCTCCTCCGCATCCAGCAGATCGGGCTCCACAACCCCGGCGCCGCGCCACAGTTCTACATCAGCTGCGCTCAAGTCAAGGTCGTCGATGGCGGCAGCACCAATCCGACCCCGACCGCCCAGATTCCGGGAGCCTTCCACAGCAACGACCCTGGCTTGACTGTCAATATCTACAACGACCCTCTCACCAACTACGTCGTCCCGGGACCTAGAGTTTCGCACTGG (SEQ ID NO: 59)MRYFLQLAAAAAFAVNSAAGHYIFQQFATGGSKYPPWKYIRRNTNPDWLQNGPVTDLSSTDLRCNVGGQVSNGTETITLNAGDEFSFILDTPVYHAGPTSLYMSKAPGAVADYDGGGAWFKIYDWGPSGTSWTLSGTYTQRIPKCIPDGEYLLRIQQIGLHNPGAAPQFYISCAQVKVVDGGSTNPTPTAQIPGAFHSNDPGLTVNIYNDPLTNYVVPGPRVSHW (SEQ ID NO: 60)HYIFQQFATGGSKYPPWKYIRRNTNPDWLQNGPVTDLSSTDLRCNVGGQVSNGTETITLNAGDEFSFILDTPVYHAGPTSLYMSKAPGAVADYDGGGAWFKIYDWGPSGTSWTLSGTYTQRIPKCIPDGEYLLRIQQIGLHNPGAAPQFYISCAQVKVVDGGSTNPTPTAQIPGAFHSNDPGLTVNIYNDPLTNYVVPGP RVSHW

The polynucleotide (SEQ ID NO:61) and amino acid (SEQ ID NO:62)sequences of an M. thermophila GH61k are provided below. The signalsequence is shown underlined in SEQ ID NO:62. SEQ ID NO:63 provides thesequence of this GH61k without the signal sequence.

(SEQ ID NO: 61) ATGCACCCCTCCCTTCTTTTCACGCTTGGGCTGGCGAGCGTGCTTGTCCCCCTCTCGTCTGCACACACTACCTTCACGACCCTCTTCGTCAACGATGTCAACCAAGGTGATGGTACCTGCATTCGCATGGCGAAGAAGGGCAATGTCGCCACCCATCCTCTCGCAGGCGGTCTCGACTCCGAAGACATGGCCTGTGGTCGGGATGGTCAAGAACCCGTGGCATTTACGTGTCCGGCCCCAGCTGGTGCCAAGTTGACTCTCGAGTTTCGCATGTGGGCCGATGCTTCGCAGTCCGGATCGATCGATCCATCCCACCTTGGCGTCATGGCCATCTACCTCAAGAAGGTTTCCGACATGAAATCTGACGCGGCCGCTGGCCCGGGCTGGTTCAAGATTTGGGACCAAGGCTACGACTTGGCGGCCAAGAAGTGGGCCACCGAGAAGCTCATCGACAACAACGGCCTCCTGAGCGTCAACCTTCCAACCGGCTTACCAACCGGCTACTACCTCGCCCGCCAGGAGATCATCACGCTCCAAAACGTTACCAATGACAGGCCAGAGCCCCAGTTCTACGTCGGCTGCGCACAGCTCTACGTCGAGGGCACCTCGGACTCACCCATCCCCTCGGACAAGACGGTCTCCATTCCCGGCCACATCAGCGACCCGGCCGACCCGGGCCTGACCTTCAACGTCTACACGGGCGACGCATCCACCTACAAGCCGCCCGGCCCCGAGGTTTACTTCCCCACCACCACCACCACCACCTCCTCCTCCTCCTCCGGAAGCAGCGACAACAAGGGAGCCAGGCGCCAGCAAACCCCCGACGACAAGCAGGCCGACGGCCTCGTTCCAGCCGACTGCCTCGTCAAGAACGCGAACTGGTGCGCCGCTGCCCTGCCGCCGTACACCGACGAGGCCGGCTGCTGGGCCGCCGCCGAGGACTGCAACAAGCAGCTGGACGCGTGCTACACCAGCGCACCCCCCTCGGGCAGCAAGGGGTGCAAGGTCTGGGAGGAGCAGGTGTGCACCGTCGTCTCGCAGAAGTGCGAGGCCGGGGATTTCAAGGGGCCCCCGCAGCTCGGGAAGGAGCTCGGCGAGGGGATCGATGAGCCTATTCCGGGGGGAAAGCTGCCCCCGGCGGTCAACGCGGGAGAGAACGGGAATCATGGCGGAGGTGGTGGTGATGATGGTGATGATGATAATGATGAGGCCGGGGCTGGGGCAGCGTCGACTCCGACTTTTGCTGCTCCTGGTGCGGCCAAGACTCCCCAACCAAACTCCGAGAGGGCCCGGCGCCGTGAGGCGCATTGGCGGCGACTGGAATCTGCTGAG (SEQ ID NO: 62)MHPSLLFTLGLASVLVPLSSAHTTFTTLFVNDVNQGDGTCIRMAKKGNVATHPLAGGLDSEDMACGRDGQEPVAFTCPAPAGAKLTLEFRMWADASQSGSIDPSHLGVMAIYLKKVSDMKSDAAAGPGWFKIWDQGYDLAAKKWATEKLIDNNGLLSVNLPTGLPTGYYLARQEIITLQNVTNDRPEPQFYVGCAQLYVEGTSDSPIPSDKTVSIPGHISDPADPGLTFNVYTGDASTYKPPGPEVYFPTTTTTTSSSSSGSSDNKGARRQQTPDDKQADGLVPADCLVKNANWCAAALPPYTDEAGCWAAAEDCNKQLDACYTSAPPSGSKGCKVWEEQVCTVVSQKCEAGDFKGPPQLGKELGEGIDEPIPGGKLPPAVNAGENGNHGGGGGDDGDDDNDEAGAGAASTPTFAAPGAAKTPQPNSERARRREAHWRRLESAE (SEQ ID NO: 63)HTTFTTLFVNDVNQGDGTCIRMAKKGNVATHPLAGGLDSEDMACGRDGQEPVAFTCPAPAGAKLTLEFRMWADASQSGSIDPSHLGVMAIYLKKVSDMKSDAAAGPGWFKIWDQGYDLAAKKWATEKLIDNNGLLSVNLPTGLPTGYYLARQEIITLQNVTNDRPEPQFYVGCAQLYVEGTSDSPIPSDKTVSIPGHISDPADPGLTFNVYTGDASTYKPPGPEVYFPTTTTTTSSSSSGSSDNKGARRQQTPDDKQADGLVPADCLVKNANWCAAALPPYTDEAGCWAAAEDCNKQLDACYTSAPPSGSKGCKVWEEQVCTVVSQKCEAGDFKGPPQLGKELGEGIDEPIPGGKLPPAVNAGENGNHGGGGGDDGDDDNDEAGAGAASTPTFAAPGAAKTPQPNSERARRREAHWRRLESAE

The polynucleotide (SEQ ID NO:64) and amino acid (SEQ ID NO:65)sequences of a M. thermophila GH61l are provided below. The signalsequence is shown underlined in SEQ ID NO:65. SEQ ID NO:66 provides thesequence of this GH61l without the signal sequence.

(SEQ ID NO: 64) ATGTTTTCTCTCAAGTTCTTTATCTTGGCCGGTGGGCTTGCTGTCCTCACCGAGGCTCACATAAGACTAGTGTCGCCCGCCCCTTTTACCAACCCTGACCAGGGCCCCAGCCCACTCCTAGAGGCTGGCAGCGACTATCCCTGCCACAACGGCAATGGGGGCGGTTATCAGGGAACGCCAACCCAGATGGCAAAGGGTTCTAAGCAGCAGCTAGCCTTCCAGGGGTCTGCCGTTCATGGGGGTGGCTCCTGCCAAGTGTCCATCACCTACGACGAAAACCCGACCGCTCAGAGCTCCTTCAAGGTCATTCACTCGATTCAAGGTGGCTGCCCCGCCAGGGCCGAGACGATCCCGGATTGCAGCGCACAAAATATCAACGCCTGCAATATAAAGCCCGATAATGCCCAGATGGACACCCCGGATAAGTATGAGTTCACGATCCCGGAGGATCTCCCCAGTGGCAAGGCCACCCTCGCCTGGACATGGATCAACACTATCGGCAACCGCGAGTTTTATATGGCATGCGCCCCGGTTGAGATCACCGGCGACGGCGGTAGCGAGTCGGCTCTGGCTGCGCTGCCCGACATGGTCATTGCCAACATCCCGTCCATCGGAGGAACCTGCGCGACCGAGGAGGGGAAGTACTACGAATATCCCAACCCCGGTAAGTCGGTCGAAACCATCCCGGGCTGGACCGATTTGGTTCCCCTGCAAGGCGAATGCGGTGCTGCCTCCGGTGTCTCGGGCTCCGGCGGAAACGCCAGCAGTGCTACCCCTGCCGCAGGGGCCGCCCCGACTCCTGCTGTCCGCGGCCGCCGTCCCACCTGGAACGCC (SEQ ID NO: 65)MFSLKFFILAGGLAVLTEAHIRLVSPAPFTNPDQGPSPLLEAGSDYPCHNGNGGGYQGTPTQMAKGSKQQLAFQGSAVHGGGSCQVSITYDENPTAQSSFKVIHSIQGGCPARAETIPDCSAQNINACNIKPDNAQMDTPDKYEFTIPEDLPSGKATLAWTWINTIGNREFYMACAPVEITGDGGSESALAALPDMVIANIPSIGGTCATEEGKYYEYPNPGKSVETIPGWTDLVPLQGECGAASGVSGSGGNASSATPAAGAAPTPAVRGRRPTWNA (SEQ ID NO: 66)HIRLVSPAPFTNPDQGPSPLLEAGSDYPCHNGNGGGYQGTPTQMAKGSKQQLAFQGSAVHGGGSCQVSITYDENPTAQSSFKVIHSIQGGCPARAETIPDCSAQNINACNIKPDNAQMDTPDKYEFTIPEDLPSGKATLAWTWINTIGNREFYMACAPVEITGDGGSESALAALPDMVIANIPSIGGTCATEEGKYYEYPNPGKSVETIPGWTDLVPLQGECGAASGVSGSGGNASSATPAAGAAPTPAV RGRRPTWNA

The polynucleotide (SEQ ID NO:67) and amino acid (SEQ ID NO:68)sequences of a M. thermophila GH61m are provided below. The signalsequence is shown underlined in SEQ ID NO:68. SEQ ID NO:69 provides thesequence of this GH61m without the signal sequence.

(SEQ ID NO: 67) ATGAAGCTCGCCACGCTCCTCGCCGCCCTCACCCTCGGGGTGGCCGACCAGCTCAGCGTCGGGTCCAGAAAGTTTGGCGTGTACGAGCACATTCGCAAGAACACGAACTACAACTCGCCCGTTACCGACCTGTCGGACACCAACCTGCGCTGCAACGTCGGCGGGGGCTCGGGCACCAGCACCACCGTGCTCGACGTCAAGGCCGGAGACTCGTTCACCTTCTTCAGCGACGTTGCCGTCTACCACCAGGGGCCCATCTCGCTGTGCGTGGACCGGACCAGTGCAGAGAGCATGGATGGACGGGAACCGGACATGCGCTGCCGAACTGGCTCACAAGCTGGCTACCTGGCGGTGACTGACTACGACGGGTCCGGTGACTGTTTCAAGATCTATGACTGGGGACCGACGTTCAACGGGGGCCAGGCGTCGTGGCCGACGAGGAATTCGTACGAGTACAGCATCCTCAAGTGCATCAGGGACGGCGAATACCTACTGCGGATTCAGTCCCTGGCCATCCATAACCCAGGTGCCCTTCCGCAGTTCTACATCAGCTGCGCCCAGGTGAATGTGACGGGCGGAGGCACCGTCACCCCGAGATCAAGGCGACCGATCCTGATCTATTTCAACTTCCACTCGTATATCGTCCCTGGGCCGGCAGTGTTCAAGTGCTAG (SEQ ID NO: 68)MKLATLLAALTLGVADQLSVGSRKFGVYEHIRKNTNYNSPVTDLSDTNLRCNVGGGSGTSTTVLDVKAGDSFTFFSDVAVYHQGPISLCVDRTSAESMDGREPDMRCRTGSQAGYLAVTDYDGSGDCFKIYDWGPTFNGGQASWPTRNSYEYSILKCIRDGEYLLRIQSLAIHNPGALPQFYISCAQVNVTGGGTVTPRSRRPILIYFNFHSYIVPGPAVFKC (SEQ ID NO: 69)DQLSVGSRKFGVYEHIRKNTNYNSPVTDLSDTNLRCNVGGGSGTSTTVLDVKAGDSFTFFSDVAVYHQGPISLCVDRTSAESMDGREPDMRCRTGSQAGYLAVTDYDGSGDCFKIYDWGPTFNGGQASWPTRNSYEYSILKCIRDGEYLLRIQSLAIHNPGALPQFYISCAQVNVTGGGTVTPRSRRPILIYFNFHSYIV PGPAVFKC

The polynucleotide (SEQ ID NO:70) and amino acid (SEQ ID NO:71)sequences of an alternative M. thermophila GH61m are provided below. Thesignal sequence is shown underlined in SEQ ID NO:71. SEQ ID NO:72provides the sequence of this GH61m without the signal sequence.

(SEQ ID NO: 70) ATGAAGCTCGCCACGCTCCTCGCCGCCCTCACCCTCGGGCTCAGCGTCGGGTCCAGAAAGTTTGGCGTGTACGAGCACATTCGCAAGAACACGAACTACAACTCGCCCGTTACCGACCTGTCGGACACCAACCTGCGCTGCAACGTCGGCGGGGGCTCGGGCACCAGCACCACCGTGCTCGACGTCAAGGCCGGAGACTCGTTCACCTTCTTCAGCGACGTTGCCGTCTACCACCAGGGGCCCATCTCGCTGTGCGTGGACCGGACCAGTGCAGAGAGCATGGATGGACGGGAACCGGACATGCGCTGCCGAACTGGCTCACAAGCTGGCTACCTGGCGGTGACTGTGATGACTGTGACTGACTACGACGGGTCCGGTGACTGTTTCAAGATCTATGACTGGGGACCGACGTTCAACGGGGGCCAGGCGTCGTGGCCGACGAGGAATTCGTACGAGTACAGCATCCTCAAGTGCATCAGGGACGGCGAATACCTACTGCGGATTCAGTCCCTGGCCATCCATAACCCAGGTGCCCTTCCGCAGTTCTACATCAGCTGCGCCCAGGTGAATGTGACGGGCGGAGGCACCATCTATTTCAACTTCCACTCGTATATCGTCCCTGGGCCGGCAGTGTTCAAGTGC (SEQ ID NO: 71)MKLATLLAALTLGLSVGSRKFGVYEHIRKNTNYNSPVTDLSDTNLRCNVGGGSGTSTTVLDVKAGDSFTFFSDVAVYHQGPISLCVDRTSAESMDGREPDMRCRTGSQAGYLAVTVMTVTDYDGSGDCFKIYDWGPTFNGGQASWPTRNSYEYSILKCIRDGEYLLRIQSLAIHNPGALPQFYISCAQVNVTGGGTIYFN FHSYIVPGPAVFKC (SEQID NO: 72) RKFGVYEHIRKNTNYNSPVTDLSDTNLRCNVGGGSGTSTTVLDVKAGDSFTFFSDVAVYHQGPISLCVDRTSAESMDGREPDMRCRTGSQAGYLAVTVMTVTDYDGSGDCFKIYDWGPTFNGGQASWPTRNSYEYSILKCIRDGEYLLRIQSLAIHNPGALPQFYISCAQVNVTGGGTIYFNFHSYIVPGPAVFKC

The polynucleotide (SEQ ID NO:73) and amino acid (SEQ ID NO:74)sequences of a M. thermophila GH61n are provided below.

(SEQ ID NO: 73) ATGACCAAGAATGCGCAGAGCAAGCAGGGCGTTGAGAACCCAACAAGCGGCGACATCCGCTGCTACACCTCGCAGACGGCGGCCAACGTCGTGACCGTGCCGGCCGGCTCGACCATTCACTACATCTCGACCCAGCAGATCAACCACCCCGGCCCGACTCAGTACTACCTGGCCAAGGTACCCCCCGGCTCGTCGGCCAAGACCTTTGACGGGTCCGGCGCCGTCTGGTTCAAGATCTCGACCACGATGCCTACCGTGGACAGCAACAAGCAGATGTTCTGGCCAGGGCAGAACACTTATGAGACCTCAAACACCACCATTCCCGCCAACACCCCGGACGGCGAGTACCTCCTTCGCGTCAAGCAGATCGCCCTCCACATGGCGTCTCAGCCCAACAAGGTCCAGTTCTACCTCGCCTGCACCCAGATCAAGATCACCGGTGGTCGCAACGGCACCCCCAGCCCGCTGGTCGCGCTGCCCGGAGCCTACAAGAGCACCGACCCCGGCATCCTGGTCGACATCTACTCCATGAAGCCCGAATCGTACCAGCCTCCCGGGCCGCCCGTCTGGCGCGGCTAA (SEQ ID NO: 74)MTKNAQSKQGVENPTSGDIRCYTSQTAANVVTVPAGSTIHYISTQQINHPGPTQYYLAKVPPGSSAKTFDGSGAVWFKISTTMPTVDSNKQMFWPGQNTYETSNTTIPANTPDGEYLLRVKQIALHMASQPNKVQFYLACTQIKITGGRNGTPSPLVALPGAYKSTDPGILVDIYSMKPESYQPPGPPVWRG

The polynucleotide (SEQ ID NO:75) and amino acid (SEQ ID NO:76)sequences of an alternative M. thermophila GH61n are provided below. Thesignal sequence is shown underlined in SEQ ID NO:76. SEQ ID NO:77provides the sequence of this GH61n without the signal sequence.

(SEQ ID NO: 75) ATGAGGCTTCTCGCAAGCTTGTTGCTCGCAGCTACGGCTGTTCAAGCTCACTTTGTTAACGGACAGCCCGAAGAGAGTGACTGGTCAGCCACGCGCATGACCAAGAATGCGCAGAGCAAGCAGGGCGTTGAGAACCCAACAAGCGGCGACATCCGCTGCTACACCTCGCAGACGGCGGCCAACGTCGTGACCGTGCCGGCCGGCTCGACCATTCACTACATCTCGACCCAGCAGATCAACCACCCCGGCCCGACTCAGTACTACCTGGCCAAGGTACCCCCCGGCTCGTCGGCCAAGACCTTTGACGGGTCCGGCGCCGTCTGGTTCAAGATCTCGACCACGATGCCTACCGTGGACAGCAACAAGCAGATGTTCTGGCCAGGGCAGAACACTTATGAGACCTCAAACACCACCATTCCCGCCAACACCCCGGACGGCGAGTACCTCCTTCGCGTCAAGCAGATCGCCCTCCACATGGCGTCTCAGCCCAACAAGGTCCAGTTCTACCTCGCCTGCACCCAGATCAAGATCACCGGTGGTCGCAACGGCACCCCCAGCCCGCTGGTCGCGCTGCCCGGAGCCTACAAGAGCACCGACCCCGGCATCCTGGTCGACATCTACTCCATGAAGCCCGAATCGTACCAGCCTCCCGGGCCGCCCGTCTGGCGCGGC (SEQ ID NO: 76)MRLLASLLLAATAVQAHFVNGQPEESDWSATRMTKNAQSKQGVENPTSGDIRCYTSQTAANVVTVPAGSTIHYISTQQINHPGPTQYYLAKVPPGSSAKTFDGSGAVWFKISTTMPTVDSNKQMFWPGQNTYETSNTTIPANTPDGEYLLRVKQIALHMASQPNKVQFYLACTQIKITGGRNGTPSPLVALPGAYKSTDPGILVDIYSMKPESYQPPGPPVWRG (SEQ ID NO: 77)HFVNGQPEESDWSATRMTKNAQSKQGVENPTSGDIRCYTSQTAANVVTVPAGSTIHYISTQQINHPGPTQYYLAKVPPGSSAKTFDGSGAVWFKISTTMPTVDSNKQMFWPGQNTYETSNTTIPANTPDGEYLLRVKQIALHMASQPNKVQFYLACTQIKITGGRNGTPSPLVALPGAYKSTDPGILVDIYSMKPESYQP PGPPVWRG

The polynucleotide (SEQ ID NO:78) and amino acid (SEQ ID NO:79)sequences of an alternative M. thermophila GH61o are provided below. Thesignal sequence is shown underlined in SEQ ID NO:79. SEQ ID NO:80provides the sequence of this GH61o without the signal sequence.

(SEQ ID NO: 78) ATGAAGCCCTTTAGCCTCGTCGCCCTGGCGACTGCCGTGAGCGGCCATGCCATCTTCCAGCGGGTGTCGGTCAACGGGCAGGACCAGGGCCAGCTCAAGGGGGTGCGGGCGCCGTCGAGCAACTCCCCGATCCAGAACGTCAACGATGCCAACATGGCCTGCAACGCCAACATTGTGTACCACGACAACACCATCATCAAGGTGCCCGCGGGAGCCCGCGTCGGCGCGTGGTGGCAGCACGTCATCGGCGGGCCGCAGGGCGCCAACGACCCGGACAACCCGATCGCCGCCTCCCACAAGGGCCCCATCCAGGTCTACCTGGCCAAGGTGGACAACGCGGCGACGGCGTCGCCGTCGGGCCTCAAGTGGTTCAAGGTGGCCGAGCGCGGCCTGAACAACGGCGTGTGGGCCTACCTGATGCGCGTCGAGCTGCTCGCCCTGCACAGCGCCTCGAGCCCCGGCGGCGCCCAGTTCTACATGGGCTGTGCACAGATCGAAGTCACTGGCTCCGGCACCAACTCGGGCTCCGACTTTGTCTCGTTCCCCGGCGCCTACTCGGCCAACGACCCGGGCATCTTGCTGAGCATCTACGACAGCTCGGGCAAGCCCAACAATGGCGGGCGCTCGTACCCGATCCCCGGCCCGCGCCCCATCTCCTGCTCCGGCAGCGGCGGCGGCGGCAACAACGGCGGCGACGGCGGCGACGACAACAACGGTGGTGGCAACAACAACGGCGGCGGCAGCGTCCCCCTGTACGGGCAGTGCGGCGGCATCGGCTACACGGGCCCGACCACCTGTGCCCAGGGAACTTGCAAGGTGTCGAACGAATACTACAGCCAGTGCCTCCCC (SEQ ID NO: 79)MKPFSLVALATAVSGHAIFQRVSVNGQDQGQLKGVRAPSSNSPIQNVNDANMACNANIVYHDNTIIKVPAGARVGAWWQHVIGGPQGANDPDNPIAASHKGPIQVYLAKVDNAATASPSGLKWFKVAERGLNNGVWAYLMRVELLALHSASSPGGAQFYMGCAQIEVTGSGTNSGSDFVSFPGAYSANDPGILLSIYDSSGKPNNGGRSYPIPGPRPISCSGSGGGGNNGGDGGDDNNGGGNNNGGGSVPLYGQCGGIGYTGPTTCAQGTCKVSNEYYSQCLP (SEQ ID NO: 80)HAIFQRVSVNGQDQGQLKGVRAPSSNSPIQNVNDANMACNANIVYHDNTIIKVPAGARVGAWWQHVIGGPQGANDPDNPIAASHKGPIQVYLAKVDNAATASPSGLKWFKVAERGLNNGVWAYLMRVELLALHSASSPGGAQFYMGCAQIEVTGSGTNSGSDFVSFPGAYSANDPGILLSIYDSSGKPNNGGRSYPIPGPRPISCSGSGGGGNNGGDGGDDNNGGGNNNGGGSVPLYGQCGGIGYTGPTT CAQGTCKVSNEYYSQCLP

The polynucleotide (SEQ ID NO:81) and amino acid (SEQ ID NO:82)sequences of a M. thermophila GH61p are provided below. The signalsequence is shown underlined in SEQ ID NO:82. SEQ ID NO:83 provides thesequence of this GH61p without the signal sequence.

(SEQ ID NO: 81) ATGAAGCTCACCTCGTCCCTCGCTGTCCTGGCCGCTGCCGGCGCCCAGGCTCACTATACCTTCCCTAGGGCCGGCACTGGTGGTTCGCTCTCTGGCGAGTGGGAGGTGGTCCGCATGACCGAGAACCATTACTCGCACGGCCCGGTCACCGATGTCACCAGCCCCGAGATGACCTGCTATCAGTCCGGCGTGCAGGGTGCGCCCCAGACCGTCCAGGTCAAGGCGGGCTCCCAATTCACCTTCAGCGTGGATCCCTCCATCGGCCACCCCGGCCCTCTCCAGTTCTACATGGCTAAGGTGCCGTCGGGCCAGACGGCCGCCACCTTTGACGGCACGGGAGCCGTGTGGTTCAAGATCTACCAAGACGGCCCGAACGGCCTCGGCACCGACAGCATTACCTGGCCCAGCGCCGGCAAAACCGAGGTCTCGGTCACCATCCCCAGCTGCATCGAGGATGGCGAGTACCTGCTCCGGGTCGAGCACACCCCCCTCCCTACAGCGCCAGCAGCGCAAAACCGAGCTCGCTCGTCACCATCCCCAGCTGCATACAAGGCCACCGACCCGGGCATCCTCTTCCAGCTCTACTGGCCCATCCCGACCGAGTACATCAACCCCGGCCCGGCCCCCGTCTCTTGCTAA (SEQ ID NO: 82)MKLTSSLAVLAAAGAQAHYTFPRAGTGGSLSGEWEVVRMTENHYSHGPVTDVTSPEMTCYQSGVQGAPQTVQVKAGSQFTFSVDPSIGHPGPLQFYMAKVPSGQTAATFDGTGAVWFKIYQDGPNGLGTDSITWPSAGKTEVSVTIPSCIEDGEYLLRVEHTPLPTAPAAQNRARSSPSPAAYKATDPGILFQLYWPIPT EYINPGPAPVSC (SEQ IDNO: 83) HYTFPRAGTGGSLSGEWEVVRMTENHYSHGPVTDVTSPEMTCYQSGVQGAPQTVQVKAGSQFTFSVDPSIGHPGPLQFYMAKVPSGQTAATFDGTGAVWFKIYQDGPNGLGTDSITWPSAGKTEVSVTIPSCIEDGEYLLRVEHTPLPTAPAAQNRARSSPSPAAYKATDPGILFQLYWPIPTEYINPGPAPVSC

The polynucleotide (SEQ ID NO:84) and amino acid (SEQ ID NO:85)sequences of an alternative M. thermophila GH61p are provided below. Thesignal sequence is shown underlined in SEQ ID NO:85. SEQ ID NO:86provides the sequence of this GH61p without the signal sequence.

(SEQ ID NO: 84) ATGAAGCTCACCTCGTCCCTCGCTGTCCTGGCCGCTGCCGGCGCCCAGGCTCACTATACCTTCCCTAGGGCCGGCACTGGTGGTTCGCTCTCTGGCGAGTGGGAGGTGGTCCGCATGACCGAGACCATTACTCGCACGGCCCGGTCACCGATGTCACCAGCCCCGAGATGACCTGCTATCAGTCCGGCGTGCAGGGTGCGCCCCAGACCGTCCAGGTCAAGGCGGGCTCCCAATTCACCTTCAGCGTGGATCCCTCCATCGGCCACCCCGGCCCTCTCCAGTTCTACATGGCTAAGGTGCCGTCGGGCCAGACGGCCGCCACCTTTGACGGCACGGGAGCCGTGTGGTTCAAGATCTACCAAGACGGCCCGAACGGCCTCGGCACCGACAGCATTACCTGGCCCAGCGCCGGCAAAACCGAGGTCTCGGTCACCATCCCCAGCTGCATCGAGGATGGCGAGTACCTGCTCCGGGTCGAGCACATCGCGCTCCACAGCGCCAGCAGCGTGGGCGGCGCCCAGTTCTACATCGCCTGCGCCCAGCTCTCCGTCACCGGCGGCTCCGGCACCCTCAACACGGGCTCGCTCGTCTCCCTGCCCGGCGCCTACAAGGCCACCGACCCGGGCATCCTCTTCCAGCTCTACTGGCCCATCCCGACCGAGTACATCAACCCCGGCCCGGCCCCCGTCTCTTGC (SEQ ID NO: 85)MKLTSSLAVLAAAGAQAHYTFPRAGTGGSLSGEWEVVRMTENHYSHGPVTDVTSPEMTCYQSGVQGAPQTVQVKAGSQFTFSVDPSIGHPGPLQFYMAKVPSGQTAATFDGTGAVWFKIYQDGPNGLGTDSITWPSAGKTEVSVTIPSCIEDGEYLLRVEHIALHSASSVGGAQFYIACAQLSVTGGSGTLNTGSLVSLPGAYKATDPGILFQLYWPIPTEYINPGPAPVSC (SEQ ID NO: 86)HYTFPRAGTGGSLSGEWEVVRMTENHYSHGPVTDVTSPEMTCYQSGVQGAPQTVQVKAGSQFTFSVDPSIGHPGPLQFYMAKVPSGQTAATFDGTGAVWFKIYQDGPNGLGTDSITWPSAGKTEVSVTIPSCIEDGEYLLRVEHIALHSASSVGGAQFYIACAQLSVTGGSGTLNTGSLVSLPGAYKATDPGILFQLYWP IPTEYINPGPAPVSC

The polynucleotide (SEQ ID NO:87) and amino acid (SEQ ID NO:88)sequences of an alternative M. thermophila GH61q are provided below. Thesignal sequence is shown underlined in SEQ ID NO:88. SEQ ID NO:89provides the sequence of this GH61q without the signal sequence.

(SEQ ID NO: 87) ATGCCGCCACCACGACTGAGCACCCTCCTTCCCCTCCTAGCCTTAATAGCCCCCACCGCCCTGGGGCACTCCCACCTCGGGTACATCATCATCAACGGCGAGGTATACCAAGGATTCGACCCGCGGCCGGAGCAGGCGAACTCGCCGTTGCGCGTGGGCTGGTCGACGGGGGCAATCGACGACGGGTTCGTGGCGCCGGCCAACTACTCGTCGCCCGACATCATCTGCCACATCGAGGGGGCCAGCCCGCCGGCGCACGCGCCCGTCCGGGCGGGCGACCGGGTGCACGTGCAATGGAACGGCTGGCCGCTCGGACACGTGGGGCCGGTGCTGTCGTACCTGGCGCCCTGCGGCGGGCTGGAGGGGTCCGAGAGCGGGTGCGCCGGGGTGGACAAGCGGCAGCTGCGGTGGACCAAGGTGGACGACTCGCTGCCGGCGATGGAGCTG (SEQ ID NO: 88)MPPPRLSTLLPLLALIAPTALGHSHLGYIIINGEVYQGFDPRPEQANSPLRVGWSTGAIDDGFVAPANYSSPDIICHIEGASPPAHAPVRAGDRVHVQWNGWPLGHVGPVLSYLAPCGGLEGSESGCAGVDKRQLRWTKVDDSLPAMEL (SEQ ID NO: 89)HSHLGYIIINGEVYQGFDPRPEQANSPLRVGWSTGAIDDGFVAPANYSSPDIICHIEGASPPAHAPVRAGDRVHVQWNGWPLGHVGPVLSYLAPCGGLEGSESGCAGVDKRQLRWTKVDDSLPAMEL

The polynucleotide (SEQ ID NO:90) and amino acid (SEQ ID NO:91)sequences of an alternative M. thermophila GH61q are provided below. Thesignal sequence is shown underlined in SEQ ID NO:91. SEQ ID NO:92provides the sequence of this GH61q without the signal sequence.

(SEQ ID NO: 90) ATGCCGCCACCACGACTGAGCACCCTCCTTCCCCTCCTAGCCTTAATAGCCCCCACCGCCCTGGGGCACTCCCACCTCGGGTACATCATCATCAACGGCGAGGTATACCAAGGATTCGACCCGCGGCCGGAGCAGGCGAACTCGCCGTTGCGCGTGGGCTGGTCGACGGGGGCAATCGACGACGGGTTCGTGGCGCCGGCCAACTACTCGTCGCCCGACATCATCTGCCACATCGAGGGGGCCAGCCCGCCGGCGCACGCGCCCGTCCGGGCGGGCGACCGGGTGCACGTGCAATGGAAACGGCTGGCCGCTCGGACACGTGGGGCCGGTGCTGTCGTACCTGGCGCCCTGCGGCGGGCTGGAGGGGTCCGAGAGCGGGTGGACGACTCGCTGCCGGCGATGGAGCTGGTCGGGGCCGCGGGGGGCGCGGGGGGCGAGGACGACGGCAGCGGCAGCGACGGCAGCGGCAGCGGCGGCAGCGGACGCGTCGGCGTGCCCGGGCAGCGCTGGGCCACCGACGTGTTGATCGCGGCCAACAACAGCTGGCAGGTCGAGATCCCGCGCGGGCTGCGGGACGGGCCGTACGTGCTGCGCCACGAGATCGTCGCGCTGCACTACGCGGCCGAGCCCGGCGGCGCGCAGAACTACCCGCTCTGCGTCAACCTGTGGGTCGAGGGCGGCGACGGCAGCATGGAGCTGGACCACTTCGACGCCACCCAGTTCTACCGGCCCGACGACCCGGGCATCCTGCTCAACGTGACGGCCGGCCTGCGCTCATACGCCGTGCCGGGCCCGACGCTGGCCGCGGGGGCGACGCCGGTGCCGTACGCGCAGCAGAACATCAGCTCGGCGAGGGCGGATGGAACCCCCGTGATTGTCACCAGGAGCACGGAGACGGTGCCCTTCACCGCGGCACCCACGCCAGCCGAGACGGCAGAAGCCAAAGGGGGGAGGTATGATGACCAAACCCGAACTAAAGACCTAAATGAACGCTTCTTTTATAGTAGCCGGCCAGAACAGAAGAGGCTGACAGCGACCTCAAGAAGGGAACTAGTTGATCATCGTACCCGGTACCTCTCCGTAGCTGTCTGCGCAGATTTCGGCGCTCATAAGGCAGCAGAAACCAACCACGAAGCTTTGAGAGGCGGCAATAAGCACCATGGCGGTGTTTCAGAG (SEQ ID NO: 91)MPPPRLSTLLPLLALIAPTALGHSHLGYIIINGEVYQGFDPRPEQANSPLRVGWSTGAIDDGFVAPANYSSPDIICHIEGASPPAHAPVRAGDRVHVQWKRLAARTRGAGAVVPGALRRAGGVRERVDDSLPAMELVGAAGGAGGEDDGSGSDGSGSGGSGRVGVPGQRWATDVLIAANNSWQVEIPRGLRDGPYVLRHEIVALHYAAEPGGAQNYPLCVNLWVEGGDGSMELDHFDATQFYRPDDPGILLNVTAGLRSYAVPGPTLAAGATPVPYAQQNISSARADGTPVIVTRSTETVPFTAAPTPAETAEAKGGRYDDQTRTKDLNERFFYSSRPEQKRLTATSRRELVDHRTRYLSVAVCADFGAHKAAETNHEALRGGNKHHGGVSE (SEQ ID NO: 92)HSHLGYIIINGEVYQGFDPRPEQANSPLRVGWSTGAIDDGFVAPANYSSPDIICHIEGASPPAHAPVRAGDRVHVQWKRLAARTRGAGAVVPGALRRAGGVRERVDDSLPAMELVGAAGGAGGEDDGSGSDGSGSGGSGRVGVPGQRWATDVLIAANNSWQVEIPRGLRDGPYVLRHEIVALHYAAEPGGAQNYPLCVNLWVEGGDGSMELDHFDATQFYRPDDPGILLNVTAGLRSYAVPGPTLAAGATPVPYAQQNISSARADGTPVIVTRSTETVPFTAAPTPAETAEAKGGRYDDQTRTKDLNERFFYSSRPEQKRLTATSRRELVDHRTRYLSVAVCADFGAHKA AETNHEALRGGNKHHGGVSE

The polynucleotide (SEQ ID NO:93) and amino acid (SEQ ID NO:94)sequences of an M. thermophila GH61r are provided below. The signalsequence is shown underlined in SEQ ID NO:94. SEQ ID NO:95 provides thesequence of this GH61r without the signal sequence.

(SEQ ID NO: 93) ATGAGGTCGACATTGGCCGGTGCCCTGGCAGCCATCGCTGCTCAGAAAGTAGCCGGCCACGCCACGTTTCAGCAGCTCTGGCACGGCTCCTCCTGTGTCCGCCTTCCGGCTAGCAACTCACCCGTCACCAATGTGGGAAGCAGAGACTTCGTCTGCAACGCTGGCACCCGCCCCGTCAGTGGCAAGTGCCCCGTGAAGGCTGGCGGCACCGTCACCATCGAGATGCACCAGCAACCCGGCGACCGCAGCTGCAACAACGAAGCCATCGGAGGGGCGCATTGGGGCCCCGTCCAGGTGTACCTGACCAAGGTTCAGGACGCCGCGACGGCCGACGGCTCGACGGGCTGGTTCAAGATCTTCTCCGACTCGTGGTCCAAGAAGCCCGGGGGCAACTTGGGCGACGACGACAACTGGGGCACGCGCGACCTGAACGCCTGCTGCGGGAAGATG GAC (SEQ ID NO: 94)MRSTLAGALAAIAAQKVAGHATFQQLWHGSSCVRLPASNSPVTNVGSRDFVCNAGTRPVSGKCPVKAGGTVTIEMHQQPGDRSCNNEAIGGAHWGPVQVYLTKVQDAATADGSTGWFKIFSDSWSKKPGGNLGDDDNWGTRDLNACCGKM D (SEQ ID NO: 95)HATFQQLWHGSSCVRLPASNSPVTNVGSRDFVCNAGTRPVSGKCPVKAGGTVTIEMHQQPGDRSCNNEAIGGAHWGPVQVYLTKVQDAATADGSTGWFKIFSDSWSKKPGGNLGDDDNWGTRDLNACCGKMD

The polynucleotide (SEQ ID NO:96) and amino acid (SEQ ID NO:97)sequences of an alternative M. thermophila GH61r are provided below. Thesignal sequence is shown underlined in SEQ ID NO:97. SEQ ID NO:98provides the sequence of this GH61r without the signal sequence.

(SEQ ID NO: 96) ATGAGGTCGACATTGGCCGGTGCCCTGGCAGCCATCGCTGCTCAGAAAGTAGCCGGCCACGCCACGTTTCAGCAGCTCTGGCACGGCTCCTCCTGTGTCCGCCTTCCGGCTAGCAACTCACCCGTCACCAATGTGGGAAGCAGAGACTTCGTCTGCAACGCTGGCACCCGCCCCGTCAGTGGCAAGTGCCCCGTGAAGGCTGGCGGCACCGTCACCATCGAGATGCACCAGCAACCCGGCGACCGCAGCTGCAACAACGAAGCCATCGGAGGGGCGCATTGGGGCCCCGTCCAGGTGTACCTGACCAAGGTTCAGGACGCCGCGACGGCCGACGGCTCGACGGGCTGGTTCAAGATCTTCTCCGACTCGTGGTCCAAGAAGCCCGGGGGCAACTCGGGCGACGACGACAACTGGGGCACGCGCGACCTGAACGCCTGCTGCGGGAAGATGGACGTGGCCATCCCGGCCGACATCGCGTCGGGCGACTACCTGCTGCGGGCCGAGGCGCTGGCCCTGCACACGGCCGGACAGGCCGGCGGCGCCCAGTTCTACATGAGCTGCTACCAGATGACGGTCGAGGGCGGCTCCGGGACCGCCAACCCGCCCACCGTCAAGTTCCCGGGCGCCTACAGCGCCAACGACCCGGGCATCCTCGTCAACATCCACGCCCCCCTTTCCAGCTACACCGCGCCCGGCCCGGCCGTCTACGCGGGCGGCACCATCCGCGAGGCCGGCTCCGCCTGCACCGGCTGCGCGCAGACCTGCAAGGTCGGGTCGTCCCCGAGCGCCGTTGCCCCCGGCAGCGGCGCGGGCAACGGCGGCGGGTTCCAACCCCGA (SEQ ID NO: 97)MRSTLAGALAAIAAQKVAGHATFQQLWHGSSCVRLPASNSPVTNVGSRDFVCNAGTRPVSGKCPVKAGGTVTIEMHQQPGDRSCNNEAIGGAHWGPVQVYLTKVQDAATADGSTGWFKIFSDSWSKKPGGNSGDDDNWGTRDLNACCGKMDVAIPADIASGDYLLRAEALALHTAGQAGGAQFYMSCYQMTVEGGSGTANPPTVKFPGAYSANDPGILVNIHAPLSSYTAPGPAVYAGGTIREAGSACTGCAQTCKVGSSPSAVAPGSGAGNGGGFQPR (SEQ ID NO: 98)HATFQQLWHGSSCVRLPASNSPVTNVGSRDFVCNAGTRPVSGKCPVKAGGTVTIEMHQQPGDRSCNNEAIGGAHWGPVQVYLTKVQDAATADGSTGWFKIFSDSWSKKPGGNSGDDDNWGTRDLNACCGKMDVAIPADIASGDYLLRAEALALHTAGQAGGAQFYMSCYQMTVEGGSGTANPPTVKFPGAYSANDPGILVNIHAPLSSYTAPGPAVYAGGTIREAGSACTGCAQTCKVGSSPSAVAPGSG AGNGGGFQPR

The polynucleotide (SEQ ID NO:99) and amino acid (SEQ ID NO:100)sequences of an M. thermophila GH61s are provided below. The signalsequence is shown underlined in SEQ ID NO:100. SEQ ID NO:101 providesthe sequence of this GH61s without the signal sequence.

(SEQ ID NO: 99) ATGCTCCTCCTCACCCTAGCCACACTCGTCACCCTCCTGGCGCGCCACGTCTCGGCTCACGCCCGGCTGTTCCGCGTCTCTGTCGACGGGAAAGACCAGGGCGACGGGCTGAACAAGTACATCCGCTCGCCGGCGACCAACGACCCCGTGCGCGACCTCTCGAGCGCCGCCATCGTGTGCAACACCCAGGGGTCCAAGGCCGCCCCGGACTTCGTCAGGGCCGCGGCCGGCGACAAGCTGACCTTCCTCTGGGCGCACGACAACCCGGACGACCCGGTCGACTACGTCCTCGACCCGTCCCACAAGGGCGCCATCCTGACCTACGTCGCCGCCTACCCCTCCGGGGACCCGACCGGCCCCATCTGGAGCAAGCTTGCCGAGGAAGGATTCACCGGCGGGCAGTGGGCGACCATCAAGATGATCGACAACGGCGGCAAGGTCGACGTGACGCTGCCCGAGGCCCTTGCGCCGGGAAAGTACCTGATCCGCCAGGAGCTGCTGGCCCTGCACCGGGCCGACTTTGCCTGCGACGACCCGGCCCACCCCAACCGCGGCGCCGAGTCGTACCCCAACTGCGTCCAGGTGGAGGTGTCGGGCAGCGGCGACAAGAAGCCGGACCAGAACTTTGACTTCAACAAGGGCTATACCTGCGATAACAAAGGACTCCACTTTAAGATCTACATCGGTCAGGACAGCCAGTATGTGGCCCCGGGGCCGCGGCCTTGGAATGGGAGC (SEQ ID NO: 100)MLLLTLATLVTLLARHVSAHARLFRVSVDGKDQGDGLNKYIRSPATNDPVRDLSSAAIVCNTQGSKAAPDFVRAAAGDKLTFLWAHDNPDDPVDYVLDPSHKGAILTYVAAYPSGDPTGPIWSKLAEEGFTGGQWATIKMIDNGGKVDVTLPEALAPGKYLIRQELLALHRADFACDDPAHPNRGAESYPNCVQVEVSGSGDKKPDQNFDFNKGYTCDNKGLHFKIYIGQDSQYVAPGPRPWNGS (SEQ ID NO: 101)HARLFRVSVDGKDQGDGLNKYIRSPATNDPVRDLSSAAIVCNTQGSKAAPDFVRAAAGDKLTFLWAHDNPDDPVDYVLDPSHKGAILTYVAAYPSGDPTGPIWSKLAEEGFTGGQWATIKMIDNGGKVDVTLPEALAPGKYLIRQELLALHRADFACDDPAHPNRGAESYPNCVQVEVSGSGDKKPDQNFDFNKGYTCDNKGLHFKIYIGQDSQYVAPGPRPWNGS

The polynucleotide (SEQ ID NO:102) and amino acid (SEQ ID NO:103)sequences of an M. thermophila GH61t are provided below.

(SEQ ID NO: 102) ATGTTCACTTCGCTTTGCATCACAGATCATTGGAGGACTCTTAGCAGCCACTCTGGGCCAGTCATGAACTATCTCGCCCATTGCACCAATGACGACTGCAAGTCTTTCAAGGGCGACAGCGGCAACGTCTGGGTCAAGATCGAGCAGCTCGCGTACAACCCGTCAGCCAACCCCCCCTGGGCGTCTGACCTCCTCCGTGAGCACGGTGCCAAGTGGAAGGTGACGATCCCGCCCAGTCTTGTCCCCGGCGAATATCTGCTGCGGCACGAGATCCTGGGGTTGCACGTCGCAGGAACCGTGATGGGCGCCCAGTTCTACCCCGGCTGCACCCAGATCAGGGTCACCGAAGGCGGGAGCACGCAGCTGCCCTCGGGTATTGCGCTCCCAGGCGCTTACGGCCCACAAGACGAGGGTATCTTGGTCGACTTGTGGAGGGTTAACCAGGGCCAGGTCAACTACACGGCGCCTGGAGGACCCGTTTGGAGCGAAGCGTGGGACACCGAGTTTGGCGGGTCCAACACGACCGAGTGCGCCACCATGCTCGACGACCTGCTCGACTACATGGCGGCCAACGACGAGTGGATCGGCTGGACGGCCTAG (SEQ ID NO: 103)MFTSLCITDHWRTLSSHSGPVMNYLAHCTNDDCKSFKGDSGNVWVKIEQLAYNPSANPPWASDLLREHGAKWKVTIPPSLVPGEYLLRHEILGLHVAGTVMGAQFYPGCTQIRVTEGGSTQLPSGIALPGAYGPQDEGILVDLWRVNQGQVNYTAPGGPVWSEAWDTEFGGSNTTECATMLDDLLDYMAANDEWIGWTA

The polynucleotide (SEQ ID NO:104) and amino acid (SEQ ID NO:105)sequences of an alternative M. thermophila GH61t are provided below.

(SEQ ID NO: 104) ATGAACTATCTCGCCCATTGCACCAATGACGACTGCAAGTCTTTCAAGGGCGACAGCGGCAACGTCTGGGTCAAGATCGAGCAGCTCGCGTACAACCCGTCAGCCAACCCCCCCTGGGCGTCTGACCTCCTCCGTGAGCACGGTGCCAAGTGGAAGGTGACGATCCCGCCCAGTCTTGTCCCCGGCGAATATCTGCTGCGGCACGAGATCCTGGGGTTGCACGTCGCAGGAACCGTGATGGGCGCCCAGTTCTACCCCGGCTGCACCCAGATCAGGGTCACCGAAGGCGGGAGCACGCAGCTGCCCTCGGGTATTGCGCTCCCAGGCGCTTACGGCCCACAAGACGAGGGTATCTTGGTCGACTTGTGGAGGGTTAACCAGGGCCAGGTCAACTACACGGCGCCTGGAGGACCCGTTTGGAGCGAAGCGTGGGACACCGAGTTTGGCGGGTCCAACACGACCGAGTGCGCCACCATGCTCGACGACCTGCTCGACTACATGGCGGCCAACGACGACCCATGCTGCACCGACCAGAACCAGTTCGGGAGTCTCGAGCCGGGGAGCAAGGCGGCCGGCGGCTCGCCGAGCCTGTACGATACCGTCTTGGTCCCCGTTCTCCAGAAGAAAGTGCCGACAAAGCTGCAGTGGAGCGGACCGGCGAGCGTCAACGGGGATGAGTTGACAGAGAGGCCC (SEQ ID NO: 105)MNYLAHCTNDDCKSFKGDSGNVWVKIEQLAYNPSANPPWASDLLREHGAKWKVTIPPSLVPGEYLLRHEILGLHVAGTVMGAQFYPGCTQIRVTEGGSTQLPSGIALPGAYGPQDEGILVDLWRVNQGQVNYTAPGGPVWSEAWDTEFGGSNTTECATMLDDLLDYMAANDDPCCTDQNQFGSLEPGSKAAGGSPSLYDTVLVPVLQKKVPTKLQWSGPASVNGDELTERP

The polynucleotide (SEQ ID NO:106) and amino acid (SEQ ID NO:107)sequences of an M. thermophila GH61u are provided below. The signalsequence is shown underlined in SEQ ID NO:107. SEQ ID NO:108 providesthe sequence of this GH61u without the signal sequence.

(SEQ ID NO: 106) ATGAAGCTGAGCGCTGCCATCGCCGTGCTCGCGGCCGCCCTTGCCGAGGGGCACTATACCTTCCCCAGCATCGCCAACACGGCCGACTGGCAATATGTGCGCATCACGACCAACTTCCAGAGCAACGGCCCCGTGACGGACGTCAACTCGGACCAGATCCGGTGCTACGAGCGCAACCCGGGCACCGGCGCCCCCGGCATCTACAACGTCACGGCCGGCACAACCATCAACTACAACGCCAAGTCGTCCATCTCCCACCCGGGACCCATGGCCTTCTACATTGCCAAGGTTCCCGCCGGCCAGTCGGCCGCCACCTGGGACGGTAAGGGCGCCGTCTGGTCCAAGATCCACCAGGAGATGCCGCACTTTGGCACCAGCCTCACCTGGGACTCCAACGGCCGCACCTCCATGCCCGTCACCATCCCCCGCTGTCTGCAGGACGGCGAGTATCTGCTGCGTGCAGAGCACATTGCCCTCCACAGCGCCGGCAGCCCCGGCGGCGCCCAGTTCTACATTTCTTGTGCCCAGCTCTCAGTCACCGGCGGCAGCGGGACCTGGAACCCCAGGAACAAGGTGTCGTTCCCCGGCGCCTACAAGGCCACTGACCCGGGCATCCTGATCAACATCTACTACCCCGTCCCGACTAGCTACACTCCCGCTGGTCCCCCCGTCGACACCTGC (SEQ ID NO: 107)MKLSAAIAVLAAALAEGHYTFPSIANTADWQYVRITTNFQSNGPVTDVNSDQIRCYERNPGTGAPGIYNVTAGTTINYNAKSSISHPGPMAFYIAKVPAGQSAATWDGKGAVWSKIHQEMPHFGTSLTWDSNGRTSMPVTIPRCLQDGEYLLRAEHIALHSAGSPGGAQFYISCAQLSVTGGSGTWNPRNKVSFPGAYKATDPGILINIYYPVPTSYTPAGPPVDTC (SEQ ID NO: 108)HYTFPSIANTADWQYVRITTNFQSNGPVTDVNSDQIRCYERNPGTGAPGIYNVTAGTTINYNAKSSISHPGPMAFYIAKVPAGQSAATWDGKGAVWSKIHQEMPHFGTSLTWDSNGRTSMPVTIPRCLQDGEYLLRAEHIALHSAGSPGGAQFYISCAQLSVTGGSGTWNPRNKVSFPGAYKATDPGILINIYYPVPTSY TPAGPPVDTC

The polynucleotide (SEQ ID NO:109) and amino acid (SEQ ID NO:110)sequences of an M. thermophila GH61v are provided below. The signalsequence is shown underlined in SEQ ID NO:110. SEQ ID NO:111 providesthe sequence of this GH61v without the signal sequence.

(SEQ ID NO: 109) ATGTACCGCACGCTCGGTTCCATTGCCCTGCTCGCGGGGGGCGCTGCCGCCCACGGCGCCGTGACCAGCTACAACATTGCGGGCAAGGACTACCCTGGATACTCGGGCTTCGCCCCTACCGGCCAGGATGTCATCCAGTGGCAATGGCCCGACTATAACCCCGTGCTGTCCGCCAGCGACCCCAAGCTCCGCTGCAACGGCGGCACCGGGGCGGCGCTGTATGCCGAGGCGGCCCCCGGCGACACCATCACGGCCACCTGGGCCCAGTGGACGCACTCCCAGGGCCCGATCCTGGTGTGGATGTACAAGTGCCCCGGCGACTTCAGCTCCTGCGACGGCTCCGGCGCGGGTTGGTTCAAGATCGACGAGGCCGGCTTCCACGGCGACGGCACGACCGTCTTCCTCGACACCGAGACCCCCTCGGGCTGGGACATTGCCAAGCTGGTCGGCGGCAACAAGTCGTGGAGCAGCAAGATCCCTGACGGCCTCGCCCCGGGCAATTACCTGGTCCGCCACGAGCTCATCGCCCTGCACCAGGCCAACAACCCGCAATTCTACCCCGAGTGCGCCCAGATCAAGGTCACCGGCTCTGGCACCGCCGAGCCCGCCGCCTCCTACAAGGCCGCCATCCCCGGCTACTGCCAGCAGAGCGACCCCAACATTTCGTTCAACATCAACGACCACTCCCTCCCGCAGGAGTACAAGATCCCCGGTCCCCCGGTCTTCAAGGGCACCGCCTCCGCCAAGGCT CGCGCTTTCCAGGCC (SEQID NO: 110) MYRTLGSIALLAGGAAAHGAVTSYNIAGKDYPGYSGFAPTGQDVIQWQWPDYNPVLSASDPKLRCNGGTGAALYAEAAPGDTITATWAQWTHSQGPILVWMYKCPGDFSSCDGSGAGWFKIDEAGFHGDGTTVFLDTETPSGWDIAKLVGGNKSWSSKIPDGLAPGNYLVRHELIALHQANNPQFYPECAQIKVTGSGTAEPAASYKAAIPGYCQQSDPNISFNINDHSLPQEYKIPGPPVFKGTASAKA RAFQA (SEQ ID NO:111) AVTSYNIAGKDYPGYSGFAPTGQDVIQWQWPDYNPVLSASDPKLRCNGGTGAALYAEAAPGDTITATWAQWTHSQGPILVWMYKCPGDFSSCDGSGAGWFKIDEAGFHGDGTTVFLDTETPSGWDIAKLVGGNKSWSSKIPDGLAPGNYLVRHELIALHQANNPQFYPECAQIKVTGSGTAEPAASYKAAIPGYCQQSDPNISFNINDHSLPQEYKIPGPPVFKGTASAKARAFQA

The polynucleotide (SEQ ID NO:112) and amino acid (SEQ ID NO:113)sequences of an M. thermophila GH61w are provided below. The signalsequence is shown underlined in SEQ ID NO:113. SEQ ID NO:114 providesthe sequence of this GH61w without the signal sequence.

(SEQ ID NO: 112) ATGCTGACAACAACCTTCGCCCTCCTGACGGCCGCTCTCGGCGTCAGCGCCCATTATACCCTCCCCAGGGTCGGGACCGGTTCCGACTGGCAGCACGTGCGGCGGGCTGACAACTGGCAAAACAACGGCTTCGTCGGCGACGTCAACTCGGAGCAGATCAGGTGCTTCCAGGCGACCCCTGCCGGCGCCCAAGACGTCTACACTGTTCAGGCGGGATCGACCGTGACCTACCACGCCAACCCCAGTATCTACCACCCCGGCCCCATGCAGTTCTACCTGGCCCGCGTTCCGGACGGACAGGACGTCAAGTCGTGGACCGGCGAGGGTGCCGTGTGGTTCAAGGTGTACGAGGAGCAGCCTCAATTTGGCGCCCAGCTGACCTGGCCTAGCAACGGCAAGAGCTCGTTCGAGGTTCCTATCCCCAGCTGCATTCGGGCGGGCAACTACCTCCTCCGCGCTGAGCACATCGCCCTGCACGTTGCCCAAAGCCAGGGCGGCGCCCAGTTCTACATCTCGTGCGCCCAGCTCCAGGTCACTGGTGGCGGCAGCACCGAGCCTTCTCAGAAGGTTTCCTTCCCGGGTGCCTACAAGTCCACCGACCCCGGCATTCTTATCAACATCAACTACCCCGTCCCTACCTCGTACCAGAATCCGGGTCCGGCTGTCTTCCGTTGC (SEQ ID NO: 113)MLTTTFALLTAALGVSAHYTLPRVGTGSDWQHVRRADNWQNNGFVGDVNSEQIRCFQATPAGAQDVYTVQAGSTVTYHANPSIYHPGPMQFYLARVPDGQDVKSWTGEGAVWFKVYEEQPQFGAQLTWPSNGKSSFEVPIPSCIRAGNYLLRAEHIALHVAQSQGGAQFYISCAQLQVTGGGSTEPSQKVSFPGAYKSTDPGILININYPVPTSYQNPGPAVFRC (SEQ ID NO: 114)HYTLPRVGTGSDWQHVRRADNWQNNGFVGDVNSEQIRCFQATPAGAQDVYTVQAGSTVTYHANPSIYHPGPMQFYLARVPDGQDVKSWTGEGAVWFKVYEEQPQFGAQLTWPSNGKSSFEVPIPSCIRAGNYLLRAEHIALHVAQSQGGAQFYISCAQLQVTGGGSTEPSQKVSFPGAYKSTDPGILININYPVPTSYQN PGPAVFRC

The polynucleotide (SEQ ID NO:115) and amino acid (SEQ ID NO:116)sequences of a M. thermophila GH61x are provided below. The signalsequence is shown underlined in SEQ ID NO:116. SEQ ID NO:117 providesthe sequence of this GH61x without the signal sequence.

(SEQ ID NO: 115) ATGAAGGTTCTCGCGCCCCTGATTCTGGCCGGTGCCGCCAGCGCCCACACCATCTTCTCATCCCTCGAGGTGGGCGGCGTCAACCAGGGCATCGGGCAGGGTGTCCGCGTGCCGTCGTACAACGGTCCGATCGAGGACGTGACGTCCAACTCGATCGCCTGCAACGGGCCCCCCAACCCGACGACGCCGACCAACAAGGTCATCACGGTCCGGGCCGGCGAGACGGTGACGGCCGTCTGGCGGTACATGCTGAGCACCACCGGCTCGGCCCCCAACGACATCATGGACAGCAGCCACAAGGGCCCGACCATGGCCTACCTCAAGAAGGTCGACAACGCCACCACCGACTCGGGCGTCGGCGGCGGCTGGTTCAAGATCCAGGAGGACGGCCTTACCAACGGCGTCTGGGGCACCGAGCGCGTCATCAACGGCCAGGGCCGCCACAACATCAAGATCCCCGAGTGCATCGCCCCCGGCCAGTACCTCCTCCGCGCCGAGATGCTTGCCCTGCACGGAGCTTCCAACTACCCCGGCGCTCAGTTCTACATGGAGTGCGCCCAGCTCAATATCGTCGGCGGCACCGGCAGCAAGACGCCGTCCACCGTCAGCTTCCCGGGCGCTTACAAGGGTACCGACCCCGGAGTCAAGATCAACATCTACTGGCCCCCCGTCACCAGCTACCAGATTCCCGGCCCCGGCG TGTTCACCTGC (SEQ IDNO: 116) MKVLAPLILAGAASAHTIFSSLEVGGVNQGIGQGVRVPSYNGPIEDVTSNSIACNGPPNPTTPTNKVITVRAGETVTAVWRYMLSTTGSAPNDIMDSSHKGPTMAYLKKVDNATTDSGVGGGWFKIQEDGLTNGVWGTERVINGQGRHNIKIPECIAPGQYLLRAEMLALHGASNYPGAQFYMECAQLNIVGGTGSKTPSTVSFPGAYKGTDPGVKINIYWPPVTSYQIPGPGVFTC (SEQ ID NO: 117)HTIFSSLEVGGVNQGIGQGVRVPSYNGPIEDVTSNSIACNGPPNPTTPTNKVITVRAGETVTAVWRYMLSTTGSAPNDIMDSSHKGPTMAYLKKVDNATTDSGVGGGWFKIQEDGLTNGVWGTERVINGQGRHNIKIPECIAPGQYLLRAEMLALHGASNYPGAQFYMECAQLNIVGGTGSKTPSTVSFPGAYKGTDPGVKINIYWPPVTSYQIPGPGVFTC

The polynucleotide (SEQ ID NO:118) and amino acid (SEQ ID NO:119)sequences of an M. thermophila GH61y are provided below. The signalsequence is underlined in SEQ ID NO:119. SEQ ID NO:120 provides thesequence of GH61y, without the signal sequence.

(SEQ ID NO: 118) ATGATCGACAACCTCCCTGATGACTCCCTACAACCCGCCTGCCTCCGCCCGGGCCACTACCTCGTCCGCCACGAGATCATCGCGCTGCACTCGGCCTGGGCCGAGGGCGAGGCCCAGTTCTACCCCTTCCCCCTTTTTCCTTTTTTTCCCTCCCTTCTTTTGTCCGGTAACTACACGATTCCCGGTCCCGCGATCTGGAAGTGCCCAGAGGCACAGCAGAACGAG (SEQ ID NO: 119)MIDNLPDDSLQPACLRPGHYLVRHEIIALHSAWAEGEAQFYPFPLFPFFPSLLLSGNYTIPGPAIWKCPEAQQNE (SEQ ID NO: 120)HYLVRHEIIALHSAWAEGEAQFYPFPLFPFFPSLLLSGNYTIPGPAIWKC PEAQQNE

Wild-type EG1b cDNA (SEQ ID NO:121) and amino acid (SEQ ID NO:122)sequences are provided below. The signal sequence is underlined in SEQID NO:122. SEQ ID NO:123 provides the sequence of EG1b, without thesignal sequence.

(SEQ ID NO: 121) ATGGGGCAGAAGACTCTCCAGGGGCTGGTGGCGGCGGCGGCACTGGCAGCCTCGGTGGCGAACGCGCAGCAACCGGGCACCTTCACGCCCGAGGTGCATCCGACGCTGCCGACGTGGAAGTGCACGACGAGCGGCGGGTGCGTCCAGCAGGACACGTCGGTGGTGCTCGACTGGAACTACCGCTGGTTCCACACCGAGGACGGTAGCAAGTCGTGCATCACCTCTAGCGGCGTCGACCGGACCCTGTGCCCGGACGAGGCGACGTGCGCCAAGAACTGCTTCGTCGAGGGCGTCAACTACACGAGCAGCGGGGTCGAGACGTCCGGCAGCTCCCTCACCCTCCGCCAGTTCTTCAAGGGCTCCGACGGCGCCATCAACAGCGTCTCCCCGCGCGTCTACCTGCTCGGGGGAGACGGCAACTATGTCGTGCTCAAGCTCCTCGGCCAGGAGCTGAGCTTCGACGTGGACGTATCGTCGCTCCCGTGCGGCGAGAACGCGGCCCTGTACCTGTCCGAGATGGACGCGACGGGAGGACGGAACGAGTACAACACGGGCGGGGCCGAGTACGGGTCGGGCTACTGTGACGCCCAGTGCCCCGTGCAGAACTGGAACAACGGGACGCTCAACACGGGCCGGGTGGGCTCGTGCTGCAACGAGATGGACATCCTCGAGGCCAACTCCAAGGCCGAGGCCTTCACGCCGCACCCCTGCATCGGCAACTCGTGCGACAAGAGCGGGTGCGGCTTCAACGCGTACGCGCGCGGTTACCACAACTACTGGGCCCCCGGCGGCACGCTCGACACGTCCCGGCCTTTCACCATGATCACCCGCTTCGTCACCGACGACGGCACCACCTCGGGCAAGCTCGCCCGCATCGAGCGCGTCTACGTCCAGGACGGCAAGAAGGTGCCCAGCGCGGCGCCCGGGGGGGACGTCATCACGGCCGACGGGTGCACCTCCGCGCAGCCCTACGGCGGCCTTTCCGGCATGGGCGACGCCCTCGGCCGCGGCATGGTCCTGGCCCTGAGCATCTGGAACGACGCGTCCGGGTACATGAACTGGCTCGACGCCGGCAGCAACGGCCCCTGCAGCGACACCGAGGGTAACCCGTCCAACATCCTGGCCAACCACCCGGACGCCCACGTCGTGCTCTCCAACATCCGCTGGGGCGACATCGGCTCCACCGTCGACACCGGCGATGGCGACAACAACGGCGGCGGCCCCAACCCGTCATCCACCACCACCGCTACCGCTACCACCACCTCCTCCGGCCCGGCCGAGCCTACCCAGACCCACTACGGCCAGTGTGGAGGGAAAGGATGGACGGGCCCTACCCGCTGCGAGACGCCCTACACCTGCAAGTACCAGAACGACTGGTACTCGCAGTGCCTGTAG (SEQ ID NO: 122)MGQKTLQGLVAAAALAASVANAQQPGTFTPEVHPTLPTWKCTTSGGCVQQDTSVVLDWNYRWFHTEDGSKSCITSSGVDRTLCPDEATCAKNCFVEGVNYTSSGVETSGSSLTLRQFFKGSDGAINSVSPRVYLLGGDGNYVVLKLLGQELSFDVDVSSLPCGENAALYLSEMDATGGRNEYNTGGAEYGSGYCDAQCPVQNWNNGTLNTGRVGSCCNEMDILEANSKAEAFTPHPCIGNSCDKSGCGFNAYARGYHNYWAPGGTLDTSRPFTMITRFVTDDGTTSGKLARIERVYVQDGKKVPSAAPGGDVITADGCTSAQPYGGLSGMGDALGRGMVLALSIWNDASGYMNWLDAGSNGPCSDTEGNPSNILANHPDAHVVLSNIRWGDIGSTVDTGDGDNNGGGPNPSSTTTATATTTSSGPAEPTQTHYGQCGGKGWTGPTRCETP YTCKYQNDWYSQCL (SEQID NO: 123) QQPGTFTPEVHPTLPTWKCTTSGGCVQQDTSVVLDWNYRWFHTEDGSKSCITSSGVDRTLCPDEATCAKNCFVEGVNYTSSGVETSGSSLTLRQFFKGSDGAINSVSPRVYLLGGDGNYVVLKLLGQELSFDVDVSSLPCGENAALYLSEMDATGGRNEYNTGGAEYGSGYCDAQCPVQNWNNGTLNTGRVGSCCNEMDILEANSKAEAFTPHPCIGNSCDKSGCGFNAYARGYHNYWAPGGTLDTSRPFTMITRFVTDDGTTSGKLARIERVYVQDGKKVPSAAPGGDVITADGCTSAQPYGGLSGMGDALGRGMVLALSIWNDASGYMNWLDAGSNGPCSDTEGNPSNILANHPDAHVVLSNIRWGDIGSTVDTGDGDNNGGGPNPSSTTTATATTTSSGPAEPTQTHYGQCGGKGWTGPTRCETPYTCKYQNDWYSQCL

Wild-type M. thermophila EG2 polynucleotide (SEQ ID NO:124) and aminoacid (SEQ ID NO:125) sequences are provided below. The signal sequenceis underlined in SEQ ID NO:125. SEQ ID NO:126 provides the sequence ofEG2, without the signal sequence.

(SEQ ID NO: 124) ATGAAGTCCTCCATCCTCGCCAGCGTCTTCGCCACGGGCGCCGTGGCTCAAAGTGGTCCGTGGCAGCAATGTGGTGGCATCGGATGGCAAGGATCGACCGACTGTGTGTCGGGTTACCACTGCGTCTACCAGAACGATTGGTACAGCCAGTGCGTGCCTGGCGCGGCGTCGACAACGCTCCAGACATCTACCACGTCCAGGCCCACCGCCACCAGCACCGCCCCTCCGTCGTCCACCACCTCGCCTAGCAAGGGCAAGCTCAAGTGGCTCGGCAGCAACGAGTCGGGCGCCGAGTTCGGGGAGGGCAACTACCCCGGCCTCTGGGGCAAGCACTTCATCTTCCCGTCGACTTCGGCGATTCAGACGCTCATCAATGATGGATACAACATCTTCCGGATCGACTTCTCGATGGAGCGTCTGGTGCCCAACCAGTTGACGTCGTCCTTCGACGAGGGCTACCTCCGCAACCTGACCGAGGTGGTCAACTTCGTGACGAACGCGGGCAAGTACGCCGTCCTGGACCCGCACAACTACGGCCGGTACTACGGCAACGTCATCACGGACACGAACGCGTTCCGGACCTTCTGGACCAACCTGGCCAAGCAGTTCGCCTCCAACTCGCTCGTCATCTTCGACACCAACAACGAGTACAACACGATGGACCAGACCCTGGTGCTCAACCTCAACCAGGCCGCCATCGACGGCATCCGGGCCGCCGGCGCGACCTCGCAGTACATCTTCGTCGAGGGCAACGCGTGGAGCGGGGCCTGGAGCTGGAACACGACCAACACCAACATGGCCGCCCTGACGGACCCGCAGAACAAGATCGTGTACGAGATGCACCAGTACCTCGACTCGGACAGCTCGGGCACCCACGCCGAGTGCGTCAGCAGCAACATCGGCGCCCAGCGCGTCGTCGGAGCCACCCAGTGGCTCCGCGCCAACGGCAAGCTCGGCGTCCTCGGCGAGTTCGCCGGCGGCGCCAACGCCGTCTGCCAGCAGGCCGTCACCGGCCTCCTCGACCACCTCCAGGACAACAGCGACGTCTGGCTGGGTGCCCTCTGGTGGGCCGCCGGTCCCTGGTGGGGCGACTACATGTACTCGTTCGAGCCTCCTTCGGGCACCGGCTATGTCAACTACAACTCGATCC TAAAGAAGTACTTGCCGTAA(SEQ ID NO: 125) MKSSILASVFATGAVAQSGPWQQCGGIGWQGSTDCVSGYHCVYQNDWYSQCVPGAASTTLQTSTTSRPTATSTAPPSSTTSPSKGKLKWLGSNESGAEFGEGNYPGLWGKHFIFPSTSAIQTLINDGYNIFRIDFSMERLVPNQLTSSFDEGYLRNLTEVVNFVTNAGKYAVLDPHNYGRYYGNVITDTNAFRTFWTNLAKQFASNSLVIFDTNNEYNTMDQTLVLNLNQAAIDGIRAAGATSQYIFVEGNAWSGAWSWNTTNTNMAALTDPQNKIVYEMHQYLDSDSSGTHAECVSSNIGAQRVVGATQWLRANGKLGVLGEFAGGANAVCQQAVTGLLDHLQDNSEVWLGALWWAAGPWWGDYMYSFEPPSGTGYVNYNSILKKYLP (SEQ ID NO: 126)QSGPWQQCGGIGWQGSTDCVSGYHCVYQNDWYSQCVPGAASTTLQTSTTSRPTATSTAPPSSTTSPSKGKLKWLGSNESGAEFGEGNYPGLWGKHFIFPSTSAIQTLINDGYNIFRIDFSMERLVPNQLTSSFDEGYLRNLTEVVNFVTNAGKYAVLDPHNYGRYYGNVITDTNAFRTFWTNLAKQFASNSLVIFDTNNEYNTMDQTLVLNLNQAAIDGIRAAGATSQYIFVEGNAWSGAWSWNTTNTNMAALTDPQNKIVYEMHQYLDSDSSGTHAECVSSNIGAQRVVGATQWLRANGKLGVLGEFAGGANAVCQQAVTGLLDHLQDNSEVWLGALWWAAGPWWGDYMYSFEPPSGTGYVNYNSILKKYLP

The polynucleotide (SEQ ID NO:127) and amino acid (SEQ ID NO:128)sequences of a wild-type BGL are provided below. The signal sequence isunderlined in SEQ ID NO:128. SEQ ID NO:129 provides the polypeptidesequence without the signal sequence.

(SEQ ID NO: 127) ATGAAGGCTGCTGCGCTTTCCTGCCTCTTCGGCAGTACCCTTGCCGTTGCAGGCGCCATTGAATCGAGAAAGGTTCACCAGAAGCCCCTCGCGAGATCTGAACCTTTTTACCCGTCGCCATGGATGAATCCCAACGCCGACGGCTGGGCGGAGGCCTATGCCCAGGCCAAGTCCTTTGTCTCCCAAATGACTCTGCTAGAGAAGGTCAACTTGACCACGGGAGTCGGCTGGGGGGCTGAGCAGTGCGTCGGCCAAGTGGGCGCGATCCCTCGCCTTGGACTTCGCAGTCTGTGCATGCATGACTCCCCTCTCGGCATCCGAGGAGCCGACTACAACTCAGCGTTCCCCTCTGGCCAGACCGTTGCTGCTACCTGGGATCGCGGTCTGATGTACCGTCGCGGCTACGCAATGGGCCAGGAGGCCAAAGGCAAGGGCATCAATGTCCTTCTCGGACCAGTCGCCGGCCCCCTTGGCCGCATGCCCGAGGGCGGTCGTAACTGGGAAGGCTTCGCTCCGGATCCCGTCCTTACCGGCATCGGCATGTCCGAGACGATCAAGGGCATTCAGGATGCTGGCGTCATCGCTTGTGCGAAGCACTTTATTGGAAACGAGCAGGAGCACTTCAGACAGGTGCCAGAAGCCCAGGGATACGGTTACAACATCAGCGAAACCCTCTCCTCCAACATTGACGACAAGACCATGCACGAGCTCTACCTTTGGCCGTTTGCCGATGCCGTCCGGGCCGGCGTCGGCTCTGTCATGTGCTCGTACCAGCAGGTCAACAACTCGTACGCCTGCCAGAACTCGAAGCTGCTGAACGACCTCCTCAAGAACGAGCTTGGGTTTCAGGGCTTCGTCATGAGCGACTGGCAGGCACAGCACACTGGCGCAGCAAGCGCCGTGGCTGGTCTCGATATGTCCATGCCGGGCGACACCCAGTTCAACACTGGCGTCAGTTTCTGGGGCGCCAATCTCACCCTCGCCGTCCTCAACGGCACAGTCCCTGCCTACCGTCTCGACGACATGGCCATGCGCATCATGGCCGCCCTCTTCAAGGTCACCAAGACCACCGACCTGGAACCGATCAACTTCTCCTTCTGGACCGACGACACTTATGGCCCGATCCACTGGGCCGCCAAGCAGGGCTACCAGGAGATTAATTCCCACGTTGACGTCCGCGCCGACCACGGCAACCTCATCCGGGAGATTGCCGCCAAGGGTACGGTGCTGCTGAAGAATACCGGCTCTCTACCCCTGAACAAGCCAAAGTTCGTGGCCGTCATCGGCGAGGATGCTGGGTCGAGCCCCAACGGGCCCAACGGCTGCAGCGACCGCGGCTGTAACGAAGGCACGCTCGCCATGGGCTGGGGATCCGGCACAGCCAACTATCCGTACCTCGTTTCCCCCGACGCCGCGCTCCAGGCCCGGGCCATCCAGGACGGCACGAGGTACGAGAGCGTCCTGTCCAACTACGCCGAGGAAAAGACAAAGGCTCTGGTCTCGCAGGCCAATGCAACCGCCATCGTCTTCGTCAATGCCGACTCAGGCGAGGGCTACATCAACGTGGACGGTAACGAGGGCGACCGTAAGAACCTGACTCTCTGGAACAACGGTGATACTCTGGTCAAGAACGTCTCGAGCTGGTGCAGCAACACCATCGTCGTCATCCACTCGGTCGGCCCGGTCCTCCTGACCGATTGGTACGACAACCCCAACATCACGGCCATTCTCTGGGCTGGTCTTCCGGGCCAGGAGTCGGGCAACTCCATCACCGACGTGCTTTACGGCAAGGTCAACCCCGCCGCCCGCTCGCCCTTCACTTGGGGCAAGACCCGCGAAAGCTATGGCGCGGACGTCCTGTACAAGCCGAATAATGGCAATGGTGCGCCCCAACAGGACTTCACCGAGGGCGTCTTCATCGACTACCGCTACTTCGACAAGGTTGACGATGACTCGGTCATCTACGAGTTCGGCCACGGCCTGAGCTACACCACCTTCGAGTACAGCAACATCCGCGTCGTCAAGTCCAACGTCAGCGAGTACCGGCCCACGACGGGCACCACGGCCCAGGCCCCGACGTTTGGCAACTTCTCCACCGACCTCGAGGACTATCTCTTCCCCAAGGACGAGTTCCCCTACATCTACCAGTACATCTACCCGTACCTCAACACGACCGACCCCCGGAGGGCCTCGGCCGATCCCCACTACGGCCAGACCGCCGAGGAGTTCCTCCCGCCCCACGCCACCGATGACGACCCCCAGCCGCTCCTCCGGTCCTCGGGCGGAAACTCCCCCGGCGGCAACCGCCAGCTGTACGACATTGTCTACACAATCACGGCCGACATCACGAATACGGGCTCCGTTGTAGGCGAGGAGGTACCGCAGCTCTACGTCTCGCTGGGCGGTCCCGAGGATCCCAAGGTGCAGCTGCGCGACTTTGACAGGATGCGGATCGAACCCGGCGAGACGAGGCAGTTCACCGGCCGCCTGACGCGCAGAGATCTGAGCAACTGGGACGTCACGGTGCAGGACTGGGTCATCAGCAGGTATCCCAAGACGGCATATGTTGGGAGGAGCAGCCGGAAGTTGGATCTCAAGAT TGAGCTTCCTTGA (SEQ IDNO: 128) MKAAALSCLFGSTLAVAGAIESRKVHQKPLARSEPFYPSPWMNPNADGWAEAYAQAKSFVSQMTLLEKVNLTTGVGWGAEQCVGQVGAIPRLGLRSLCMHDSPLGIRGADYNSAFPSGQTVAATWDRGLMYRRGYAMGQEAKGKGINVLLGPVAGPLGRMPEGGRNWEGFAPDPVLTGIGMSETIKGIQDAGVIACAKHFIGNEQEHFRQVPEAQGYGYNISETLSSNIDDKTMHELYLWPFADAVRAGVGSVMCSYQQVNNSYACQNSKLLNDLLKNELGFQGFVMSDWQAQHTGAASAVAGLDMSMPGDTQFNTGVSFWGANLTLAVLNGTVPAYRLDDMAMRIMAALFKVTKTTDLEPINFSFWTDDTYGPIHWAAKQGYQEINSHVDVRADHGNLIREIAAKGTVLLKNTGSLPLNKPKFVAVIGEDAGSSPNGPNGCSDRGCNEGTLAMGWGSGTANYPYLVSPDAALQARAIQDGTRYESVLSNYAEEKTKALVSQANATAIVFVNADSGEGYINVDGNEGDRKNLTLWNNGDTLVKNVSSWCSNTIVVIHSVGPVLLTDWYDNPNITAILWAGLPGQESGNSITDVLYGKVNPAARSPFTWGKTRESYGADVLYKPNNGNGAPQQDFTEGVFIDYRYFDKVDDDSVIYEFGHGLSYTTFEYSNIRVVKSNVSEYRPTTGTTAQAPTFGNFSTDLEDYLFPKDEFPYIYQYIYPYLNTTDPRRASADPHYGQTAEEFLPPHATDDDPQPLLRSSGGNSPGGNRQLYDIVYTITADITNTGSVVGEEVPQLYVSLGGPEDPKVQLRDFDRMRIEPGETRQFTGRLTRRDLSNWDVTVQDWVISRY PKTAYVGRSSRKLDLKIELP(SEQ ID NO: 129) IESRKVHQKPLARSEPFYPSPWMNPNADGWAEAYAQAKSFVSQMTLLEKVNLTTGVGWGAEQCVGQVGAIPRLGLRSLCMHDSPLGIRGADYNSAFPSGQTVAATWDRGLMYRRGYAMGQEAKGKGINVLLGPVAGPLGRMPEGGRNWEGFAPDPVLTGIGMSETIKGIQDAGVIACAKHFIGNEQEHFRQVPEAQGYGYNISETLSSNIDDKTMHELYLWPFADAVRAGVGSVMCSYQQVNNSYACQNSKLLNDLLKNELGFQGFVMSDWQAQHTGAASAVAGLDMSMPGDTQFNTGVSFWGANLTLAVLNGTVPAYRLDDMAMRIMAALFKVTKTTDLEPINFSFWTDDTYGPIHWAAKQGYQEINSHVDVRADHGNLIREIAAKGTVLLKNTGSLPLNKPKFVAVIGEDAGSSPNGPNGCSDRGCNEGTLAMGWGSGTANYPYLVSPDAALQARAIQDGTRYESVLSNYAEEKTKALVSQANATAIVFVNADSGEGYINVDGNEGDRKNLTLWNNGDTLVKNVSSWCSNTIVVIHSVGPVLLTDWYDNPNITAILWAGLPGQESGNSITDVLYGKVNPAARSPFTWGKTRESYGADVLYKPNNGNGAPQQDFTEGVFIDYRYFDKVDDDSVIYEFGHGLSYTTFEYSNIRVVKSNVSEYRPTTGTTAQAPTFGNFSTDLEDYLFPKDEFPYIYQYIYPYLNTTDPRRASADPHYGQTAEEFLPPHATDDDPQPLLRSSGGNSPGGNRQLYDIVYTITADITNTGSVVGEEVPQLYVSLGGPEDPKVQLRDFDRMRIEPGETRQFTGRLTRRDLSNWDVTVQDWVISRYPKTAYVGRSSRKLDLKIEL P

The polynucleotide (SEQ ID NO:130) and amino acid (SEQ ID NO:131)sequences of a BGL variant (“Variant 883”) are provided below. Thesignal sequence is underlined in SEQ ID NO:131. SEQ ID NO:132 providesthe sequence of this BGL variant, without the signal sequence.

(SEQ ID NO: 130) ATGAAGGCTGCTGCGCTTTCCTGCCTCTTCGGCAGTACCCTTGCCGTTGCAGGCGCCATTGAATCGAGAAAGGTTCACCAGAAGCCCCTCGCGAGATCTGAACCTTTTTACCCGTCGCCATGGATGAATCCCAACGCCGACGGCTGGGCGGAGGCCTATGCCCAGGCCAAGTCCTTTGTCTCCCAAATGACTCTGCTAGAGAAGGTCAACTTGACCACGGGAGTCGGCTGGGGGGCTGAGCAGTGCGTCGGCCAAGTGGGCGCGATCCCTCGCCTTGGACTTCGCAGTCTGTGCATGCATGACTCCCCTCTCGGCATCCGAGGAGCCGACTACAACTCAGCGTTCCCCTCTGGCCAGACCGTTGCTGCTACCTGGGATCGCGGTCTGATGTACCGTCGCGGCTACGCAATGGGCCAGGAGGCCAAAGGCAAGGGCATCAATGTCCTTCTCGGACCAGTCGCCGGCCCCCTTGGCCGCATGCCCGAGGGCGGTCGTAACTGGGAAGGCTTCGCTCCGGATCCCGTCCTTACCGGCATCGGCATGTCCGAGACGATCAAGGGCATTCAGGATGCTGGCGTCATCGCTTGTGCGAAGCACTTTATTGGAAACGAGCAGGAGCACTTCAGACAGGTGCCAGAAGCCCAGGGATACGGTTACAACATCAGCGAAACCCTCTCCTCCAACATTGACGACAAGACCATGCACGAGCTCTACCTTTGGCCGTTTGCCGATGCCGTCCGGGCCGGCGTCGGCTCTGTCATGTGCTCGTACAACCAGGTCAACAACTCGTACGCCTGCCAGAACTCGAAGCTGCTGAACGACCTCCTCAAGAACGAGCTTGGGTTTCAGGGCTTCGTCATGAGCGACTGGTGGGCACAGCACACTGGCGCAGCAAGCGCCGTGGCTGGTCTCGATATGTCCATGCCGGGCGACACCATGTTCAACACTGGCGTCAGTTTCTGGGGCGCCAATCTCACCCTCGCCGTCCTCAACGGCACAGTCCCTGCCTACCGTCTCGACGACATGGCCATGCGCATCATGGCCGCCCTCTTCAAGGTCACCAAGACCACCGACCTGGAACCGATCAACTTCTCCTTCTGGACCCGCGACACTTATGGCCCGATCCACTGGGCCGCCAAGCAGGGCTACCAGGAGATTAATTCCCACGTTGACGTCCGCGCCGACCACGGCAACCTCATCCGGAACATTGCCGCCAAGGGTACGGTGCTGCTGAAGAATACCGGCTCTCTACCCCTGAACAAGCCAAAGTTCGTGGCCGTCATCGGCGAGGATGCTGGGCCGAGCCCCAACGGGCCCAACGGCTGCAGCGACCGCGGCTGTAACGAAGGCACGCTCGCCATGGGCTGGGGATCCGGCACAGCCAACTATCCGTACCTCGTTTCCCCCGACGCCGCGCTCCAGTTGCGGGCCATCCAGGACGGCACGAGGTACGAGAGCGTCCTGTCCAACTACGCCGAGGAAAATACAAAGGCTCTGGTCTCGCAGGCCAATGCAACCGCCATCGTCTTCGTCAATGCCGACTCAGGCGAGGGCTACATCAACGTGGACGGTAACGAGGGCGACCGTAAGAACCTGACTCTCTGGAACAACGGTGATACTCTGGTCAAGAACGTCTCGAGCTGGTGCAGCAACACCATCGTCGTCATCCACTCGGTCGGCCCGGTCCTCCTGACCGATTGGTACGACAACCCCAACATCACGGCCATTCTCTGGGCTGGTCTTCCGGGCCAGGAGTCGGGCAACTCCATCACCGACGTGCTTTACGGCAAGGTCAACCCCGCCGCCCGCTCGCCCTTCACTTGGGGCAAGACCCGCGAAAGCTATGGCGCGGACGTCCTGTACAAGCCGAATAATGGCAATTGGGCGCCCCAACAGGACTTCACCGAGGGCGTCTTCATCGACTACCGCTACTTCGACAAGGTTGACGATGACTCGGTCATCTACGAGTTCGGCCACGGCCTGAGCTACACCACCTTCGAGTACAGCAACATCCGCGTCGTCAAGTCCAACGTCAGCGAGTACCGGCCCACGACGGGCACCACGATTCAGGCCCCGACGTTTGGCAACTTCTCCACCGACCTCGAGGACTATCTCTTCCCCAAGGACGAGTTCCCCTACATCCCGCAGTACATCTACCCGTACCTCAACACGACCGACCCCCGGAGGGCCTCGGCCGATCCCCACTACGGCCAGACCGCCGAGGAGTTCCTCCCGCCCCACGCCACCGATGACGACCCCCAGCCGCTCCTCCGGTCCTCGGGCGGAAACTCCCCCGGCGGCAACCGCCAGCTGTACGACATTGTCTACACAATCACGGCCGACATCACGAATACGGGCTCCGTTGTAGGCGAGGAGGTACCGCAGCTCTACGTCTCGCTGGGCGGTCCCGAGGATCCCAAGGTGCAGCTGCGCGACTTTGACAGGATGCGGATCGAACCCGGCGAGACGAGGCAGTTCACCGGCCGCCTGACGCGCAGAGATCTGAGCAACTGGGACGTCACGGTGCAGGACTGGGTCATCAGCAGGTATCCCAAGACGGCATATGTTGGGAGGAGCAGCCGGAAGTTGGATCTCAAGAT TGAGCTTCCTTGA (SEQ IDNO: 131) MKAAALSCLFGSTLAVAGAIESRKVHQKPLARSEPFYPSPWMNPNADGWAEAYAQAKSFVSQMTLLEKVNLTTGVGWGAEQCVGQVGAIPRLGLRSLCMHDSPLGIRGADYNSAFPSGQTVAATWDRGLMYRRGYAMGQEAKGKGINVLLGPVAGPLGRMPEGGRNWEGFAPDPVLTGIGMSETIKGIQDAGVIACAKHFIGNEQEHFRQVPEAQGYGYNISETLSSNIDDKTMHELYLWPFADAVRAGVGSVMCSYNQVNNSYACQNSKLLNDLLKNELGFQGFVMSDWWAQHTGAASAVAGLDMSMPGDTMFNTGVSFWGANLTLAVLNGTVPAYRLDDMAMRIMAALFKVTKTTDLEPINFSFWTRDTYGPIHWAAKQGYQEINSHVDVRADHGNLIRNIAAKGTVLLKNTGSLPLNKPKFVAVIGEDAGPSPNGPNGCSDRGCNEGTLAMGWGSGTANYPYLVSPDAALQLRAIQDGTRYESVLSNYAEENTKALVSQANATAIVFVNADSGEGYINVDGNEGDRKNLTLWNNGDTLVKNVSSWCSNTIVVIHSVGPVLLTDWYDNPNITAILWAGLPGQESGNSITDVLYGKVNPAARSPFTWGKTRESYGADVLYKPNNGNWAPQQDFTEGVFIDYRYFDKVDDDSVIYEFGHGLSYTTFEYSNIRVVKSNVSEYRPTTGTTIQAPTFGNFSTDLEDYLFPKDEFPYIPQYIYPYLNTTDPRRASADPHYGQTAEEFLPPHATDDDPQPLLRSSGGNSPGGNRQLYDIVYTITADITNTGSVVGEEVPQLYVSLGGPEDPKVQLRDFDRMRIEPGETRQFTGRLTRRDLSNWDVTVQDWVISRY PKTAYVGRSSRKLDLKIELP(SEQ ID NO: 132) IESRKVHQKPLARSEPFYPSPWMNPNADGWAEAYAQAKSFVSQMTLLEKVNLTTGVGWGAEQCVGQVGAIPRLGLRSLCMHDSPLGIRGADYNSAFPSGQTVAATWDRGLMYRRGYAMGQEAKGKGINVLLGPVAGPLGRMPEGGRNWEGFAPDPVLTGIGMSETIKGIQDAGVIACAKHFIGNEQEHFRQVPEAQGYGYNISETLSSNIDDKTMHELYLWPFADAVRAGVGSVMCSYNQVNNSYACQNSKLLNDLLKNELGFQGFVMSDWWAQHTGAASAVAGLDMSMPGDTMFNTGVSFWGANLTLAVLNGTVPAYRLDDMAMRIMAALFKVTKTTDLEPINFSFWTRDTYGPIHWAAKQGYQEINSHVDVRADHGNLIRNIAAKGTVLLKNTGSLPLNKPKFVAVIGEDAGPSPNGPNGCSDRGCNEGTLAMGWGSGTANYPYLVSPDAALQLRAIQDGTRYESVLSNYAEENTKALVSQANATAIVFVNADSGEGYINVDGNEGDRKNLTLWNNGDTLVKNVSSWCSNTIVVIHSVGPVLLTDWYDNPNITAILWAGLPGQESGNSITDVLYGKVNPAARSPFTWGKTRESYGADVLYKPNNGNWAPQQDFTEGVFIDYRYFDKVDDDSVIYEFGHGLSYTTFEYSNIRVVKSNVSEYRPTTGTTIQAPTFGNFSTDLEDYLFPKDEFPYIPQYIYPYLNTTDPRRASADPHYGQTAEEFLPPHATDDDPQPLLRSSGGNSPGGNRQLYDIVYTITADITNTGSVVGEEVPQLYVSLGGPEDPKVQLRDFDRMRIEPGETRQFTGRLTRRDLSNWDVTVQDWVISRYPKTAYVGRSSRKLDLKIEL P

The polynucleotide (SEQ ID NO:133) and amino acid (SEQ ID NO:134)sequences of a BGL variant (“Variant 900”) are provided below. Thesignal sequence is underlined in SEQ ID NO:134. SEQ ID NO:135 providesthe sequence of this BGL variant, without the signal sequence.

(SEQ ID NO: 133) ATGAAGGCTGCTGCGCTTTCCTGCCTCTTCGGCAGTACCCTTGCCGTTGCAGGCGCCATTGAATCGAGAAAGGTTCACCAGAAGCCCCTCGCGAGATCTGAACCTTTTTACCCGTCGCCATGGATGAATCCCAACGCCATCGGCTGGGCGGAGGCCTATGCCCAGGCCAAGTCCTTTGTCTCCCAAATGACTCTGCTAGAGAAGGTCAACTTGACCACGGGAGTCGGCTGGGGGGAGGAGCAGTGCGTCGGCAACGTGGGCGCGATCCCTCGCCTTGGACTTCGCAGTCTGTGCATGCATGACTCCCCTCTCGGCGTGCGAGGAACCGACTACAACTCAGCGTTCCCCTCTGGCCAGACCGTTGCTGCTACCTGGGATCGCGGTCTGATGTACCGTCGCGGCTACGCAATGGGCCAGGAGGCCAAAGGCAAGGGCATCAATGTCCTTCTCGGACCAGTCGCCGGCCCCCTTGGCCGCATGCCCGAGGGCGGTCGTAACTGGGAAGGCTTCGCTCCGGATCCCGTCCTTACCGGCATCGGCATGTCCGAGACGATCAAGGGCATTCAGGATGCTGGCGTCATCGCTTGTGCGAAGCACTTTATTGGAAACGAGCAGGAGCACTTCAGACAGGTGCCAGAAGCCCAGGGATACGGTTACAACATCAGCGAAACCCTCTCCTCCAACATTGACGACAAGACCATGCACGAGCTCTACCTTTGGCCGTTTGCCGATGCCGTCCGGGCCGGCGTCGGCTCTGTCATGTGCTCGTACAACCAGGGCAACAACTCGTACGCCTGCCAGAACTCGAAGCTGCTGAACGACCTCCTCAAGAACGAGCTTGGGTTTCAGGGCTTCGTCATGAGCGACTGGTGGGCACAGCACACTGGCGCAGCAAGCGCCGTGGCTGGTCTCGATATGTCCATGCCGGGCGACACCATGGTCAACACTGGCGTCAGTTTCTGGGGCGCCAATCTCACCCTCGCCGTCCTCAACGGCACAGTCCCTGCCTACCGTCTCGACGACATGTGCATGCGCATCATGGCCGCCCTCTTCAAGGTCACCAAGACCACCGACCTGGAACCGATCAACTTCTCCTTCTGGACCCGCGACACTTATGGCCCGATCCACTGGGCCGCCAAGCAGGGCTACCAGGAGATTAATTCCCACGTTGACGTCCGCGCCGACCACGGCAACCTCATCCGGAACATTGCCGCCAAGGGTACGGTGCTGCTGAAGAATACCGGCTCTCTACCCCTGAACAAGCCAAAGTTCGTGGCCGTCATCGGCGAGGATGCTGGGCCGAGCCCCAACGGGCCCAACGGCTGCAGCGACCGCGGCTGTAACGAAGGCACGCTCGCCATGGGCTGGGGATCCGGCACAGCCAACTATCCGTACCTCGTTTCCCCCGACGCCGCGCTCCAGGCGCGGGCCATCCAGGACGGCACGAGGTACGAGAGCGTCCTGTCCAACTACGCCGAGGAAAATACAAAGGCTCTGGTCTCGCAGGCCAATGCAACCGCCATCGTCTTCGTCAATGCCGACTCAGGCGAGGGCTACATCAACGTGGACGGTAACGAGGGCGACCGTAAGAACCTGACTCTCTGGAACAACGGTGATACTCTGGTCAAGAACGTCTCGAGCTGGTGCAGCAACACCATCGTCGTCATCCACTCGGTCGGCCCGGTCCTCCTGACCGATTGGTACGACAACCCCAACATCACGGCCATTCTCTGGGCTGGTCTTCCGGGCCAGGAGTCGGGCAACTCCATCACCGACGTGCTTTACGGCAAGGTCAACCCCGCCGCCCGCTCGCCCTTCACTTGGGGCAAGACCCGCGAAAGCTATGGCGCGGACGTCCTGTACAAGCCGAATAATGGCAATTGGGCGCCCCAACAGGACTTCACCGAGGGCGTCTTCATCGACTACCGCTACTTCGACAAGGTTGACGATGACTCGGTCATCTACGAGTTCGGCCACGGCCTGAGCTACACCACCTTCGAGTACAGCAACATCCGCGTCGTCAAGTCCAACGTCAGCGAGTACCGGCCCACGACGGGCACCACGATTCAGGCCCCGACGTTTGGCAACTTCTCCACCGACCTCGAGGACTATCTCTTCCCCAAGGACGAGTTCCCCTACATCCCGCAGTACATCTACCCGTACCTCAACACGACCGACCCCCGGAGGGCCTCGGGCGATCCCCACTACGGCCAGACCGCCGAGGAGTTCCTCCCGCCCCACGCCACCGATGACGACCCCCAGCCGCTCCTCCGGTCCTCGGGCGGAAACTCCCCCGGCGGCAACCGCCAGCTGTACGACATTGTCTACACAATCACGGCCGACATCACGAATACGGGCTCCGTTGTAGGCGAGGAGGTACCGCAGCTCTACGTCTCGCTGGGCGGTCCCGAGGATCCCAAGGTGCAGCTGCGCGACTTTGACAGGATGCGGATCGAACCCGGCGAGACGAGGCAGTTCACCGGCCGCCTGACGCGCAGAGATCTGAGCAACTGGGACGTCACGGTGCAGGACTGGGTCATCAGCAGGTATCCCAAGACGGCATATGTTGGGAGGAGCAGCCGGAAGTTGGATCTCAAGAT TGAGCTTCCTTGA (SEQ IDNO: 134) MKAAALSCLFGSTLAVAGAIESRKVHQKPLARSEPFYPSPWMNPNAIGWAEAYAQAKSFVSQMTLLEKVNLTTGVGWGEEQCVGNVGAIPRLGLRSLCMHDSPLGVRGTDYNSAFPSGQTVAATWDRGLMYRRGYAMGQEAKGKGINVLLGPVAGPLGRMPEGGRNWEGFAPDPVLTGIGMSETIKGIQDAGVIACAKHFIGNEQEHFRQVPEAQGYGYNISETLSSNIDDKTMHELYLWPFADAVRAGVGSVMCSYNQGNNSYACQNSKLLNDLLKNELGFQGFVMSDWWAQHTGAASAVAGLDMSMPGDTMVNTGVSFWGANLTLAVLNGTVPAYRLDDMCMRIMAALFKVTKTTDLEPINFSFWTRDTYGPIHWAAKQGYQEINSHVDVRADHGNLIRNIAAKGTVLLKNTGSLPLNKPKFVAVIGEDAGPSPNGPNGCSDRGCNEGTLAMGWGSGTANYPYLVSPDAALQARAIQDGTRYESVLSNYAEENTKALVSQANATAIVFVNADSGEGYINVDGNEGDRKNLTLWNNGDTLVKNVSSWCSNTIVVIHSVGPVLLTDWYDNPNITAILWAGLPGQESGNSITDVLYGKVNPAARSPFTWGKTRESYGADVLYKPNNGNWAPQQDFTEGVFIDYRYFDKVDDDSVIYEFGHGLSYTTFEYSNIRVVKSNVSEYRPTTGTTIQAPTFGNFSTDLEDYLFPKDEFPYIPQYIYPYLNTTDPRRASGDPHYGQTAEEFLPPHATDDDPQPLLRSSGGNSPGGNRQLYDIVYTITADITNTGSVVGEEVPQLYVSLGGPEDPKVQLRDFDRMRIEPGETRQFTGRLTRRDLSNWDVTVQDWVISRY PKTAYVGRSSRKLDLKIELP(SEQ ID NO: 135) IESRKVHQKPLARSEPFYPSPWMNPNAIGWAEAYAQAKSFVSQMTLLEKVNLTTGVGWGEEQCVGNVGAIPRLGLRSLCMHDSPLGVRGTDYNSAFPSGQTVAATWDRGLMYRRGYAMGQEAKGKGINVLLGPVAGPLGRMPEGGRNWEGFAPDPVLTGIGMSETIKGIQDAGVIACAKHFIGNEQEHFRQVPEAQGYGYNISETLSSNIDDKTMHELYLWPFADAVRAGVGSVMCSYNQGNNSYACQNSKLLNDLLKNELGFQGFVMSDWWAQHTGAASAVAGLDMSMPGDTMVNTGVSFWGANLTLAVLNGTVPAYRLDDMCMRIMAALFKVTKTTDLEPINFSFWTRDTYGPIHWAAKQGYQEINSHVDVRADHGNLIRNIAAKGTVLLKNTGSLPLNKPKFVAVIGEDAGPSPNGPNGCSDRGCNEGTLAMGWGSGTANYPYLVSPDAALQARAIQDGTRYESVLSNYAEENTKALVSQANATAIVFVNADSGEGYINVDGNEGDRKNLTLWNNGDTLVKNVSSWCSNTIVVIHSVGPVLLTDWYDNPNITAILWAGLPGQESGNSITDVLYGKVNPAARSPFTWGKTRESYGADVLYKPNNGNWAPQQDFTEGVFIDYRYFDKVDDDSVIYEFGHGLSYTTFEYSNIRVVKSNVSEYRPTTGTTIQAPTFGNFSTDLEDYLFPKDEFPYIPQYIYPYLNTTDPRRASGDPHYGQTAEEFLPPHATDDDPQPLLRSSGGNSPGGNRQLYDIVYTITADITNTGSVVGEEVPQLYVSLGGPEDPKVQLRDFDRMRIEPGETRQFTGRLTRRDLSNWDVTVQDWVISRYPKTAYVGRSSRKLDLKIEL P

The polynucleotide (SEQ ID NO:136) and amino acid (SEQ ID NO:137)sequences of wild-type Talaromyces emersonii CBH1 are provided below.The signal sequence is shown underlined in SEQ ID NO:137. SEQ ID NO:138provides the sequence of this CBH1, without the signal sequence.

(SEQ ID NO: 136) ATGCTTCGACGGGCTCTTCTTCTATCCTCTTCCGCCATCCTTGCTGTCAAGGCACAGCAGGCCGGCACGGCGACGGCAGAGAACCACCCGCCCCTGACATGGCAGGAATGCACCGCCCCTGGGAGCTGCACCACCCAGAACGGGGCGGTCGTTCTTGATGCGAACTGGCGTTGGGTGCACGATGTGAACGGATACACCAACTGCTACACGGGCAATACCTGGGACCCCACGTACTGCCCTGACGACGAAACCTGCGCCCAGAACTGTGCGCTGGACGGCGCGGATTACGAGGGCACCTACGGCGTGACTTCGTCGGGCAGCTCCTTGAAACTCAATTTCGTCACCGGGTCGAACGTCGGATCCCGTCTCTACCTGCTGCAGGACGACTCGACCTATCAGATCTTCAAGCTTCTGAACCGCGAGTTCAGCTTTGACGTCGATGTCTCCAATCTTCCGTGCGGATTGAACGGCGCTCTGTACTTTGTCGCCATGGACGCCGACGGCGGCGTGTCCAAGTACCCGAACAACAAGGCTGGTGCCAAGTACGGAACCGGGTATTGCGACTCCCAATGCCCACGGGACCTCAAGTTCATCGACGGCGAGGCCAACGTCGAGGGCTGGCAGCCGTCTTCGAACAACGCCAACACCGGAATTGGCGACCACGGCTCCTGCTGTGCGGAGATGGATGTCTGGGAAGCAAACAGCATCTCCAATGCGGTCACTCCGCACCCGTGCGACACGCCAGGCCAGACGATGTGCTCTGGAGATGACTGCGGTGGCACATACTCTAACGATCGCTACGCGGGAACCTGCGATCCTGACGGCTGTGACTTCAACCCTTACCGCATGGGCAACACTTCTTTCTACGGGCCTGGCAAGATCATCGATACCACCAAGCCCTTCACTGTCGTGACGCAGTTCCTCACTGATGATGGTACGGATACTGGAACTCTCAGCGAGATCAAGCGCTTCTACATCCAGAACAGCAACGTCATTCCGCAGCCCAACTCGGACATCAGTGGCGTGACCGGCAACTCGATCACGACGGAGTTCTGCACTGCTCAGAAGCAGGCCTTTGGCGACACGGACGACTTCTCTCAGCACGGTGGCCTGGCCAAGATGGGAGCGGCCATGCAGCAGGGTATGGTCCTGGTGATGAGTTTGTGGGACGACTACGCCGCGCAGATGCTGTGGTTGGATTCCGACTACCCGACGGATGCGGACCCCACGACCCCTGGTATTGCCCGTGGAACGTGTCCGACGGACTCGGGCGTCCCATCGGATGTCGAGTCGCAGAGCCCCAACTCCTACGTGACCTACTCGAACATTAAGTTTGGTCCGATCAACTCG ACCTTCACCGCTTCGTGA(SEQ ID NO: 137) MLRRALLLSSSAILAVKAQQAGTATAENHPPLTWQECTAPGSCTTQNGAVVLDANWRWVHDVNGYTNCYTGNTWDPTYCPDDETCAQNCALDGADYEGTYGVTSSGSSLKLNFVTGSNVGSRLYLLQDDSTYQIFKLLNREFSFDVDVSNLPCGLNGALYFVAMDADGGVSKYPNNKAGAKYGTGYCDSQCPRDLKFIDGEANVEGWQPSSNNANTGIGDHGSCCAEMDVWEANSISNAVTPHPCDTPGQTMCSGDDCGGTYSNDRYAGTCDPDGCDFNPYRMGNTSFYGPGKIIDTTKPFTVVTQFLTDDGTDTGTLSEIKRFYIQNSNVIPQPNSDISGVTGNSITTEFCTAQKQAFGDTDDFSQHGGLAKMGAAMQQGMVLVMSLWDDYAAQMLWLDSDYPTDADPTTPGIARGTCPTDSGVPSDVESQSPNSYVTYSNIKFGPINS TFTAS (SEQ ID NO:138) QQAGTATAENHPPLTWQECTAPGSCTTQNGAVVLDANWRWVHDVNGYTNCYTGNTWDPTYCPDDETCAQNCALDGADYEGTYGVTSSGSSLKLNFVTGSNVGSRLYLLQDDSTYQIFKLLNREFSFDVDVSNLPCGLNGALYFVAMDADGGVSKYPNNKAGAKYGTGYCDSQCPRDLKFIDGEANVEGWQPSSNNANTGIGDHGSCCAEMDVWEANSISNAVTPHPCDTPGQTMCSGDDCGGTYSNDRYAGTCDPDGCDFNPYRMGNTSFYGPGKIIDTTKPFTVVTQFLTDDGTDTGTLSEIKRFYIQNSNVIPQPNSDISGVTGNSITTEFCTAQKQAFGDTDDFSQHGGLAKMGAAMQQGMVLVMSLWDDYAAQMLWLDSDYPTDADPTTPGIARGTCPTDSGVPSDVESQSPNSYVTYSNIKFGPINSTFTAS

The polynucleotide (SEQ ID NO:139) and amino acid (SEQ ID NO:140)sequences of wild-type M. thermophila CBH1a are provided below. Thesignal sequence is shown underlined in SEQ ID NO:140. SEQ ID NO:141provides the sequence of this CBH1a, without the signal sequence.

(SEQ ID NO: 139) ATGTACGCCAAGTTCGCGACCCTCGCCGCCCTTGTGGCTGGCGCCGCTGCTCAGAACGCCTGCACTCTGACCGCTGAGAACCACCCCTCGCTGACGTGGTCCAAGTGCACGTCTGGCGGCAGCTGCACCAGCGTCCAGGGTTCCATCACCATCGACGCCAACTGGCGGTGGACTCACCGGACCGATAGCGCCACCAACTGCTACGAGGGCAACAAGTGGGATACTTCGTACTGCAGCGATGGTCCTTCTTGCGCCTCCAAGTGCTGCATCGACGGCGCTGACTACTCGAGCACCTATGGCATCACCACGAGCGGTAACTCCCTGAACCTCAAGTTCGTCACCAAGGGCCAGTACTCGACCAACATCGGCTCGCGTACCTACCTGATGGAGAGCGACACCAAGTACCAGATGTTCCAGCTCCTCGGCAACGAGTTCACCTTCGATGTCGACGTCTCCAACCTCGGCTGCGGCCTCAATGGCGCCCTCTACTTCGTGTCCATGGATGCCGATGGTGGCATGTCCAAGTACTCGGGCAACAAGGCAGGTGCCAAGTACGGTACCGGCTACTGTGATTCTCAGTGCCCCCGCGACCTCAAGTTCATCAACGGCGAGGCCAACGTAGAGAACTGGCAGAGCTCGACCAACGATGCCAACGCCGGCACGGGCAAGTACGGCAGCTGCTGCTCCGAGATGGACGTCTGGGAGGCCAACAACATGGCCGCCGCCTTCACTCCCCACCCTTGCACCGTGATCGGCCAGTCGCGCTGCGAGGGCGACTCGTGCGGCGGTACCTACAGCACCGACCGCTATGCCGGCATCTGCGACCCCGACGGATGCGACTTCAACTCGTACCGCCAGGGCAACAAGACCTTCTACGGCAAGGGCATGACGGTCGACACGACCAAGAAGATCACGGTCGTCACCCAGTTCCTCAAGAACTCGGCCGGCGAGCTCTCCGAGATCAAGCGGTTCTACGTCCAGAACGGCAAGGTCATCCCCAACTCCGAGTCCACCATCCCGGGCGTCGAGGGCAACTCCATCACCCAGGACTGGTGCGACCGCCAGAAGGCCGCCTTCGGCGACGTGACCGACTTCCAGGACAAGGGCGGCATGGTCCAGATGGGCAAGGCCCTCGCGGGGCCCATGGTCCTCGTCATGTCCATCTGGGACGACCACGCCGTCAACATGCTCTGGCTCGACTCCACCTGGCCCATCGACGGCGCCGGCAAGCCGGGCGCCGAGCGCGGTGCCTGCCCCACCACCTCGGGCGTCCCCGCTGAGGTCGAGGCCGAGGCCCCCAACTCCAACGTCATCTTCTCCAACATCCGCTTCGGCCCCATCGGCTCCACCGTCTCCGGCCTGCCCGACGGCGGCAGCGGCAACCCCAACCCGCCCGTCAGCTCGTCCACCCCGGTCCCCTCCTCGTCCACCACATCCTCCGGTTCCTCCGGCCCGACTGGCGGCACGGGTGTCGCTAAGCACTATGAGCAATGCGGAGGAATCGGGTTCACTGGCCCTACCCAGTGCGAGAGCCCCTACACTTGCACCAAGCTGAATGACTGGTACTCGCAGTGCCTGTAA (SEQ ID NO: 140)MYAKFATLAALVAGAAAQNACTLTAENHPSLTYSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSWCSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQMFQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRDLKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCTVIGQSRCEGDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKGMTVDTTKKITVVTQFLKNSAGELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDFQDKGGMVQMGKALAGPMVLVMSIWDDHAVNMLWLDSTWPIDGAGKPGAERGACPTTSGVPAEVEAEAPNSNVIFSNIRFGPIGSTVSGLPDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGTGVAKHYEQCGGIGFTGPTQCESPYTCTKLNDWYSQCL (SEQ ID NO: 141)QNACTLTAENHPSLTYSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSWCSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQMFQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRDLKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCTVIGQSRCEGDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKGMTVDTTKKITVVTQFLKNSAGELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDFQDKGGMVQMGKALAGPMVLVMSIWDDHAVNMLWLDSTWPIDGAGKPGAERGACPTTSGVPAEVEAEAPNSNVIFSNIRFGPIGSTVSGLPDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGTGVAKHYEQCGGIGFTGPTQCESPYTCTK LNDWYSQCL

The polynucleotide (SEQ ID NO:142) and amino acid (SEQ ID NO:143)sequences of a M. thermophila CBH1a variant (“Variant 145”) are providedbelow. The signal sequence is shown underlined in SEQ ID NO:143. SEQ IDNO:144 provides the sequence of this CBH1a, without the signal sequence.

(SEQ ID NO: 142) ATGTACGCCAAGTTCGCGACCCTCGCCGCCCTTGTGGCTGGCGCCGCTGCTCAGAACGCCTGCACTCTGACCGCTGAGAACCACCCCTCGCTGACGTGGTCCAAGTGCACGTCTGGCGGCAGCTGCACCAGCGTCCAGGGTTCCATCACCATCGACGCCAACTGGCGGTGGACTCACCGGACCGATAGCGCCACCAACTGCTACGAGGGCAACAAGTGGGATACTTCGTGGTGCAGCGATGGTCCTTCTTGCGCCTCCAAGTGCTGCATCGACGGCGCTGACTACTCGAGCACCTATGGCATCACCACGAGCGGTAACTCCCTGAACCTCAAGTTCGTCACCAAGGGCCAGTACTCGACCAACATCGGCTCGCGTACCTACCTGATGGAGAGCGACACCAAGTACCAGATGTTCCAGCTCCTCGGCAACGAGTTCACCTTCGATGTCGACGTCTCCAACCTCGGCTGCGGCCTCAATGGCGCCCTCTACTTCGTGTCCATGGATGCCGATGGTGGCATGTCCAAGTACTCGGGCAACAAGGCAGGTGCCAAGTACGGTACCGGCTACTGTGATTCTCAGTGCCCCCGCGACCTCAAGTTCATCAACGGCGAGGCCAACGTAGAGAACTGGCAGAGCTCGACCAACGATGCCAACGCCGGCACGGGCAAGTACGGCAGCTGCTGCTCCGAGATGGACGTCTGGGAGGCCAACAACATGGCCGCCGCCTTCACTCCCCACCCTTGCACCGTGATCGGCCAGTCGCGCTGCGAGGGCGACTCGTGCGGCGGTACCTACAGCACCGACCGCTATGCCGGCATCTGCGACCCCGACGGATGCGACTTCAACTCGTACCGCCAGGGCAACAAGACCTTCTACGGCAAGGGCATGACGGTCGACACGACCAAGAAGATCACGGTCGTCACCCAGTTCCTCAAGAACTCGGCCGGCGAGCTCTCCGAGATCAAGCGGTTCTACGTCCAGAACGGCAAGGTCATCCCCAACTCCGAGTCCACCATCCCGGGCGTCGAGGGCAACTCCATCACCCAGGACTGGTGCGACCGCCAGAAGGCCGCCTTCGGCGACGTGACCGACTTCCAGGACAAGGGCGGCATGGTCCAGATGGGCAAGGCCCTCGCGGGGCCCATGGTCCTCGTCATGTCCATCTGGGACGACCACGCCGTCAACATGCTCTGGCTCGACTCCACCTGGCCCATCGACGGCGCCGGCAAGCCGGGCGCCGAGCGCGGTGCCTGCCCCACCACCTCGGGCGTCCCCGCTGAGGTCGAGGCCGAGGCCCCCAACTCCAACGTCATCTTCTCCAACATCCGCTTCGGCCCCATCGGCTCCACCGTCTCCGGCCTGCCCGACGGCGGCAGCGGCAACCCCAACCCGCCCGTCAGCTCGTCCACCCCGGTCCCCTCCTCGTCCACCACATCCTCCGGTTCCTCCGGCCCGACTGGCGGCACGGGTGTCGCTAAGCACTATGAGCAATGCGGAGGAATCGGGTTCACTGGCCCTACCCAGTGCGAGAGCCCCTACACTTGCACCAAGCTGAATGACTGGTACTCGCAGTGCCTGTAA (SEQ ID NO: 143)MYAKFATLAALVAGAAAQNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSWCSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQMFQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRDLKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCTVIGQSRCEGDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKGMTVDTTKKITVVTQFLKNSAGELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDFQDKGGMVQMGKALAGPMVLVMSIWDDHAVNMLWLDSTWPIDGAGKPGAERGACPTTSGVPAEVEAEAPNSNVIFSNIRFGPIGSTVSGLPDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGTGVAKHYEQCGGIGFTGPTQCESPYTCTKLNDWYSQCL (SEQ ID NO: 144)QNACTLTAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSWCSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQMFQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRDLKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCTVIGQSRCEGDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKGMTVDTTKKITVVTQFLKNSAGELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQDWCDRQKAAFGDVTDFQDKGGMVQMGKALAGPMVLVMSIWDDHAVNMLWLDSTWPIDGAGKPGAERGACPTTSGVPAEVEAEAPNSNVIFSNIRFGPIGSTVSGLPDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGTGVAKHYEQCGGIGFTGPTQCESPYTCTK LNDWYSQCL

The polynucleotide (SEQ ID NO:145) and amino acid (SEQ ID NO:146)sequences of a M. thermophila CBH1a variant (“Variant 983”) are providedbelow. The signal sequence is shown underlined in SEQ ID NO:146. SEQ IDNO:147 provides the sequence of this CBH1a variant, without the signalsequence.

(SEQ ID NO: 145) ATGTACGCCAAGTTCGCGACCCTCGCCGCCCTTGTGGCTGGCGCCGCTGCTCAGAACGCCTGCACTCTGAACGCTGAGAACCACCCCTCGCTGACGTGGTCCAAGTGCACGTCTGGCGGCAGCTGCACCAGCGTCCAGGGTTCCATCACCATCGACGCCAACTGGCGGTGGACTCACCGGACCGATAGCGCCACCAACTGCTACGAGGGCAACAAGTGGGATACTTCGTACTGCAGCGATGGTCCTTCTTGCGCCTCCAAGTGCTGCATCGACGGCGCTGACTACTCGAGCACCTATGGCATCACCACGAGCGGTAACTCCCTGAACCTCAAGTTCGTCACCAAGGGCCAGTACTCGACCAACATCGGCTCGCGTACCTACCTGATGGAGAGCGACACCAAGTACCAGATGTTCCAGCTCCTCGGCAACGAGTTCACCTTCGATGTCGACGTCTCCAACCTCGGCTGCGGCCTCAATGGCGCCCTCTACTTCGTGTCCATGGATGCCGATGGTGGCATGTCCAAGTACTCGGGCAACAAGGCAGGTGCCAAGTACGGTACCGGCTACTGTGATTCTCAGTGCCCCCGCGACCTCAAGTTCATCAACGGCGAGGCCAACGTAGAGAACTGGCAGAGCTCGACCAACGATGCCAACGCCGGCACGGGCAAGTACGGCAGCTGCTGCTCCGAGATGGACGTCTGGGAGGCCAACAACATGGCCGCCGCCTTCACTCCCCACCCTTGCACCGTGATCGGCCAGTCGCGCTGCGAGGGCGACTCGTGCGGCGGTACCTACAGCACCGACCGCTATGCCGGCATCTGCGACCCCGACGGATGCGACTTCAACTCGTACCGCCAGGGCAACAAGACCTTCTACGGCAAGGGCATGACGGTCGACACGACCAAGAAGATCACGGTCGTCACCCAGTTCCTCAAGAACTCGGCCGGCGAGCTCTCCGAGATCAAGCGGTTCTACGTCCAGAACGGCAAGGTCATCCCCAACTCCGAGTCCACCATCCCGGGCGTCGAGGGCAACTCCATCACCCAGGAGTACTGCGACCGCCAGAAGGCCGCCTTCGGCGACGTGACCGACTTCCAGGACAAGGGCGGCATGGTCCAGATGGGCAAGGCCCTCGCGGGGCCCATGGTCCTCGTCATGTCCATCTGGGACGACCACGCCGACAACATGCTCTGGCTCGACTCCACCTGGCCCATCGACGGCGCCGGCAAGCCGGGCGCCGAGCGCGGTGCCTGCCCCACCACCTCGGGCGTCCCCGCTGAGGTCGAGGCCGAGGCCCCCAACTCCAACGTCATCTTCTCCAACATCCGCTTCGGCCCCATCGGCTCCACCGTCTCCGGCCTGCCCGACGGCGGCAGCGGCAACCCCAACCCGCCCGTCAGCTCGTCCACCCCGGTCCCCTCCTCGTCCACCACATCCTCCGGTTCCTCCGGCCCGACTGGCGGCACGGGTGTCGCTAAGCACTATGAGCAATGCGGAGGAATCGGGTTCACTGGCCCTACCCAGTGCGAGAGCCCCTACACTTGCACCAAGCTGAATGACTGGTACTCGCAGTGCCTGTAA (SEQ ID NO: 146)MYAKFATLAALVAGAAAQNACTLNAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSYCSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQMFQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRDLKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCTVIGQSRCEGDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKGMTVDTTKKITVVTQFLKNSAGELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQEYCDRQKAAFGDVTDFQDKGGMVQMGKALAGPMVLVMSIWDDHADNMLWLDSTWPIDGAGKPGAERGACPTTSGVPAEVEAEAPNSNVIFSNIRFGPIGSTVSGLPDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGTGVAKHYEQCGGIGFTGPTQCESPYTCTKLNDWYSQCL (SEQ ID NO: 147)QNACTLNAENHPSLTWSKCTSGGSCTSVQGSITIDANWRWTHRTDSATNCYEGNKWDTSYCSDGPSCASKCCIDGADYSSTYGITTSGNSLNLKFVTKGQYSTNIGSRTYLMESDTKYQMFQLLGNEFTFDVDVSNLGCGLNGALYFVSMDADGGMSKYSGNKAGAKYGTGYCDSQCPRDLKFINGEANVENWQSSTNDANAGTGKYGSCCSEMDVWEANNMAAAFTPHPCTVIGQSRCEGDSCGGTYSTDRYAGICDPDGCDFNSYRQGNKTFYGKGMTVDTTKKITVVTQFLKNSAGELSEIKRFYVQNGKVIPNSESTIPGVEGNSITQEYCDRQKAAFGDVTDFQDKGGMVQMGKALAGPMVLVMSIWDDHADNMLWLDSTWPIDGAGKPGAERGACPTTSGVPAEVEAEAPNSNVIFSNIRFGPIGSTVSGLPDGGSGNPNPPVSSSTPVPSSSTTSSGSSGPTGGTGVAKHYEQCGGIGFTGPTQCESPYTCTK LNDWYSQCL

The polynucleotide (SEQ ID NO:148) and amino acid (SEQ ID NO:149)sequences of wild-type M. thermophila CBH2b are provided below. Thesignal sequence is shown underlined in SEQ ID NO:149. SEQ ID NO:150provides the sequence of this CBH2b, without the signal sequence.

(SEQ ID NO: 148) ATGGCCAAGAAGCTTTTCATCACCGCCGCGCTTGCGGCTGCCGTGTTGGCGGCCCCCGTCATTGAGGAGCGCCAGAACTGCGGCGCTGTGTGGACTCAATGCGGCGGTAACGGGTGGCAAGGTCCCACATGCTGCGCCTCGGGCTCGACCTGCGTTGCGCAGAACGAGTGGTACTCTCAGTGCCTGCCCAACAGCCAGGTGACGAGTTCCACCACTCCGTCGTCGACTTCCACCTCGCAGCGCAGCACCAGCACCTCCAGCAGCACCACCAGGAGCGGCAGCTCCTCCTCCTCCTCCACCACGCCCCCGCCCGTCTCCAGCCCCGTGACCAGCATTCCCGGCGGTGCGACCTCCACGGCGAGCTACTCTGGCAACCCCTTCTCGGGCGTCCGGCTCTTCGCCAACGACTACTACAGGTCCGAGGTCCACAATCTCGCCATTCCTAGCATGACTGGTACTCTGGCGGCCAAGGCTTCCGCCGTCGCCGAAGTCCCTAGCTTCCAGTGGCTCGACCGGAACGTCACCATCGACACCCTGATGGTCCAGACTCTGTCCCAGGTCCGGGCTCTCAATAAGGCCGGTGCCAATCCTCCCTATGCTGCCCAACTCGTCGTCTACGACCTCCCCGACCGTGACTGTGCCGCCGCTGCGTCCAACGGCGAGTTTTCGATTGCAAACGGCGGCGCCGCCAACTACAGGAGCTACATCGACGCTATCCGCAAGCACATCATTGAGTACTCGGACATCCGGATCATCCTGGTTATCGAGCCCGACTCGATGGCCAACATGGTGACCAACATGAACGTGGCCAAGTGCAGCAACGCCGCGTCGACGTACCACGAGTTGACCGTGTACGCGCTCAAGCAGCTGAACCTGCCCAACGTCGCCATGTATCTCGACGCCGGCCACGCCGGCTGGCTCGGCTGGCCCGCCAACATCCAGCCCGCCGCCGAGCTGTTTGCCGGCATCTACAATGATGCCGGCAAGCCGGCTGCCGTCCGCGGCCTGGCCACTAACGTCGCCAACTACAACGCCTGGAGCATCGCTTCGGCCCCGTCGTACACGTCGCCTAACCCTAACTACGACGAGAAGCACTACATCGAGGCCTTCAGCCCGCTCTTGAACTCGGCCGGCTTCCCCGCACGCTTCATTGTCGACACTGGCCGCAACGGCAAACAACCTACCGGCCAACAACAGTGGGGTGACTGGTGCAATGTCAAGGGCACCGGCTTTGGCGTGCGCCCGACGGCCAACACGGGCCACGAGCTGGTCGATGCCTTTGTCTGGGTCAAGCCCGGCGGCGAGTCCGACGGCACAAGCGACACCAGCGCCGCCCGCTACGACTACCACTGCGGCCTGTCCGATGCCCTGCAGCCTGCCCCCGAGGCTGGACAGTGGTTCCAGGCCTACTTCGAGCAGCTGCTCACCAACGCCAACCCGCCCTTCTAA (SEQ ID NO: 149)MAKKLFITAALAAAVLAAPVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPPPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVHNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVQTLSQVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGAANYRSYIDAIRKHIIEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIEAFSPLLNSAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWFQAYFEQLLTNANPPF (SEQ ID NO: 150)APVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPPPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVHNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVQTLSQVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGAANYRSYIDAIRKHIIEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIEAFSPLLNSAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWF QAYFEQLLTNANPPF

The polynucleotide (SEQ ID NO:151) and amino acid (SEQ ID NO:152)sequences of a M. thermophila CBH2b variant (“Variant 196”) are providedbelow. The signal sequence is shown underlined in SEQ ID NO:152. SEQ IDNO:153 provides the sequence of this CBH2b variant, without the signalsequence.

(SEQ ID NO: 151) ATGGCCAAGAAGCTTTTCATCACCGCCGCGCTTGCGGCTGCCGTGTTGGCGGCCCCCGTCATTGAGGAGCGCCAGAACTGCGGCGCTGTGTGGACTCAATGCGGCGGTAACGGGTGGCAAGGTCCCACATGCTGCGCCTCGGGCTCGACCTGCGTTGCGCAGAACGAGTGGTACTCTCAGTGCCTGCCCAACAGCCAGGTGACGAGTTCCACCACTCCGTCGTCGACTTCCACCTCGCAGCGCAGCACCAGCACCTCCAGCAGCACCACCAGGAGCGGCAGCTCCTCCTCCTCCTCCACCACGCCCACCCCCGTCTCCAGCCCCGTGACCAGCATTCCCGGCGGTGCGACCTCCACGGCGAGCTACTCTGGCAACCCCTTCTCGGGCGTCCGGCTCTTCGCCAACGACTACTACAGGTCCGAGGTCCACAATCTCGCCATTCCTAGCATGACTGGTACTCTGGCGGCCAAGGCTTCCGCCGTCGCCGAAGTCCCTAGCTTCCAGTGGCTCGACCGGAACGTCACCATCGACACCCTGATGGTCCCGACTCTGTCCCGCGTCCGGGCTCTCAATAAGGCCGGTGCCAATCCTCCCTATGCTGCCCAACTCGTCGTCTACGACCTCCCCGACCGTGACTGTGCCGCCGCTGCGTCCAACGGCGAGTTTTCGATTGCAAACGGCGGCGCCGCCAACTACAGGAGCTACATCGACGCTATCCGCAAGCACATCATTGAGTACTCGGACATCCGGATCATCCTGGTTATCGAGCCCGACTCGATGGCCAACATGGTGACCAACATGAACGTGGCCAAGTGCAGCAACGCCGCGTCGACGTACCACGAGTTGACCGTGTACGCGCTCAAGCAGCTGAACCTGCCCAACGTCGCCATGTATCTCGACGCCGGCCACGCCGGCTGGCTCGGCTGGCCCGCCAACATCCAGCCCGCCGCCGAGCTGTTTGCCGGCATCTACAATGATGCCGGCAAGCCGGCTGCCGTCCGCGGCCTGGCCACTAACGTCGCCAACTACAACGCCTGGAGCATCGCTTCGGCCCCGTCGTACACGTCGCCTAACCCTAACTACGACGAGAAGCACTACATCGAGGCCTTCAGCCCGCTCTTGAACTCGGCCGGCTTCCCCGCACGCTTCATTGTCGACACTGGCCGCAACGGCAAACAACCTACCGGCCAACAACAGTGGGGTGACTGGTGCAATGTCAAGGGCACCGGCTTTGGCGTGCGCCCGACGGCCAACACGGGCCACGAGCTGGTCGATGCCTTTGTCTGGGTCAAGCCCGGCGGCGAGTCCGACGGCACAAGCGACACCAGCGCCGCCCGCTACGACTACCACTGCGGCCTGTCCGATGCCCTGCAGCCTGCCCCCGAGGCTGGACAGTGGTTCCAGGCCTACTTCGAGCAGCTGCTCACCAACGCCAACCCGCCCTTCTAA (SEQ ID NO: 152)MAKKLFITAALAAAVLAAPVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPTPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVHNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVPTLSRVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGAANYRSYIDAIRKHIIEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIEAFSPLLNSAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWFQAYFEQLLTNANPPF (SEQ ID NO: 153)APVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPTPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVHNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVPTLSRVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGAANYRSYIDAIRKHIIEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIEAFSPLLNSAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWF QAYFEQLLTNANPPF

The polynucleotide (SEQ ID NO:154) and amino acid (SEQ ID NO:155)sequences of a M. thermophila CBH2b variant (“Variant 287”) are providedbelow. The signal sequence is shown underlined in SEQ ID NO:155. SEQ IDNO:156 provides the sequence of this CBH2b variant, without the signalsequence.

(SEQ ID NO: 154) ATGGCCAAGAAGCTTTTCATCACCGCCGCGCTTGCGGCTGCCGTGTTGGCGGCCCCCGTCATTGAGGAGCGCCAGAACTGCGGCGCTGTGTGGACTCAATGCGGCGGTAACGGGTGGCAAGGTCCCACATGCTGCGCCTCGGGCTCGACCTGCGTTGCGCAGAACGAGTGGTACTCTCAGTGCCTGCCCAACAGCCAGGTGACGAGTTCCACCACTCCGTCGTCGACTTCCACCTCGCAGCGCAGCACCAGCACCTCCAGCAGCACCACCAGGAGCGGCAGCTCCTCCTCCTCCTCCACCACGCCCCCGCCCGTCTCCAGCCCCGTGACCAGCATTCCCGGCGGTGCGACCTCCACGGCGAGCTACTCTGGCAACCCCTTCTCGGGCGTCCGGCTCTTCGCCAACGACTACTACAGGTCCGAGGTCCACAATCTCGCCATTCCTAGCATGACTGGTACTCTGGCGGCCAAGGCTTCCGCCGTCGCCGAAGTCCCTAGCTTCCAGTGGCTCGACCGGAACGTCACCATCGACACCCTGATGGTCCCGACTCTGTCCCGCGTCCGGGCTCTCAATAAGGCCGGTGCCAATCCTCCCTATGCTGCCCAACTCGTCGTCTACGACCTCCCCGACCGTGACTGTGCCGCCGCTGCGTCCAACGGCGAGTTTTCGATTGCAAACGGCGGCGCCGCCAACTACAGGAGCTACATCGACGCTATCCGCAAGCACATCAAGGAGTACTCGGACATCCGGATCATCCTGGTTATCGAGCCCGACTCGATGGCCAACATGGTGACCAACATGAACGTGGCCAAGTGCAGCAACGCCGCGTCGACGTACCACGAGTTGACCGTGTACGCGCTCAAGCAGCTGAACCTGCCCAACGTCGCCATGTATCTCGACGCCGGCCACGCCGGCTGGCTCGGCTGGCCCGCCAACATCCAGCCCGCCGCCGAGCTGTTTGCCGGCATCTACAATGATGCCGGCAAGCCGGCTGCCGTCCGCGGCCTGGCCACTAACGTCGCCAACTACAACGCCTGGAGCATCGCTTCGGCCCCGTCGTACACGTCGCCTAACCCTAACTACGACGAGAAGCACTACATCGAGGCCTTCAGCCCGCTCTTGAACGACGCCGGCTTCCCCGCACGCTTCATTGTCGACACTGGCCGCAACGGCAAACAACCTACCGGCCAACAACAGTGGGGTGACTGGTGCAATGTCAAGGGCACCGGCTTTGGCGTGCGCCCGACGGCCAACACGGGCCACGAGCTGGTCGATGCCTTTGTCTGGGTCAAGCCCGGCGGCGAGTCCGACGGCACAAGCGACACCAGCGCCGCCCGCTACGACTACCACTGCGGCCTGTCCGATGCCCTGCAGCCTGCCCCCGAGGCTGGACAGTGGTTCCAGGCCTACTTCGAGCAGCTGCTCACCAACGCCAACCCGCCCTTCTAA (SEQ NO: 155)MAKKLFITAALAAAVLAAPVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPPPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVHNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVPTLSRVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGAANYRSYIDAIRKHIKEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIEAFSPLLNDAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWFQAYFEQLLTNANPPF (SEQ NO: 156)APVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPPPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVHNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVPTLSRVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGAANYRSYIDAIRKHIKEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIEAFSPLLNDAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWF QAYFEQLLTNANPPF

The polynucleotide (SEQ ID NO:157) and amino acid (SEQ ID NO:158)sequences of a M. thermophila CBH2b variant (“Variant 962”) are providedbelow. The signal sequence is shown underlined in SEQ ID NO:158. SEQ IDNO:159 provides the sequence of this CBH2b variant, without the signalsequence.

(SEQ NO: 157) ATGGCCAAGAAGCTTTTCATCACCGCCGCGCTTGCGGCTGCCGTGTTGGCGGCCCCCGTCATTGAGGAGCGCCAGAACTGCGGCGCTGTGTGGACTCAATGCGGCGGTAACGGGTGGCAAGGTCCCACATGCTGCGCCTCGGGCTCGACCTGCGTTGCGCAGAACGAGTGGTACTCTCAGTGCCTGCCCAACAGCCAGGTGACGAGTTCCACCACTCCGTCGTCGACTTCCACCTCGCAGCGCAGCACCAGCACCTCCAGCAGCACCACCAGGAGCGGCAGCTCCTCCTCCTCCTCCACCACGCCCACCCCCGTCTCCAGCCCCGTGACCAGCATTCCCGGCGGTGCGACCTCCACGGCGAGCTACTCTGGCAACCCCTTCTCGGGCGTCCGGCTCTTCGCCAACGACTACTACAGGTCCGAGGTCATGAATCTCGCCATTCCTAGCATGACTGGTACTCTGGCGGCCAAGGCTTCCGCCGTCGCCGAAGTCCCTAGCTTCCAGTGGCTCGACCGGAACGTCACCATCGACACCCTGATGGTCACCACTCTGTCCCAGGTCCGGGCTCTCAATAAGGCCGGTGCCAATCCTCCCTATGCTGCCCAACTCGTCGTCTACGACCTCCCCGACCGTGACTGTGCCGCCGCTGCGTCCAACGGCGAGTTTTCGATTGCAAACGGCGGCAGCGCCAACTACAGGAGCTACATCGACGCTATCCGCAAGCACATCATTGAGTACTCGGACATCCGGATCATCCTGGTTATCGAGCCCGACTCGATGGCCAACATGGTGACCAACATGAACGTGGCCAAGTGCAGCAACGCCGCGTCGACGTACCACGAGTTGACCGTGTACGCGCTCAAGCAGCTGAACCTGCCCAACGTCGCCATGTATCTCGACGCCGGCCACGCCGGCTGGCTCGGCTGGCCCGCCAACATCCAGCCCGCCGCCGAGCTGTTTGCCGGCATCTACAATGATGCCGGCAAGCCGGCTGCCGTCCGCGGCCTGGCCACTAACGTCGCCAACTACAACGCCTGGAGCATCGCTTCGGCCCCGTCGTACACGCAGCCTAACCCTAACTACGACGAGAAGCACTACATCGAGGCCTTCAGCCCGCTCTTGAACTCGGCCGGCTTCCCCGCACGCTTCATTGTCGACACTGGCCGCAACGGCAAACAACCTACCGGCCAACAACAGTGGGGTGACTGGTGCAATGTCAAGGGCACCGGCTTTGGCGTGCGCCCGACGGCCAACACGGGCCACGAGCTGGTCGATGCCTTTGTCTGGGTCAAGCCCGGCGGCGAGTCCGACGGCACAAGCGACACCAGCGCCGCCCGCTACGACTACCACTGCGGCCTGTCCGATGCCCTGCAGCCTGCCCCCGAGGCTGGACAGTGGTTCCAGGCCTACTTCGAGCAGCTGCTCACCAACGCCAACCCGCCCTTCTAA (SEQ NO: 158)MAKKLFITAALAAAVLAAPVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPTPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVMNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVTTLSQVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGSANYRSYIDAIRKHIIEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTQPNPNYDEKHYIEAFSPLLNSAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWFQAYFEQLLTNANPPF (SEQ NO: 159)APVIEERQNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSQVTSSTTPSSTSTSQRSTSTSSSTTRSGSSSSSSTTPTPVSSPVTSIPGGATSTASYSGNPFSGVRLFANDYYRSEVMNLAIPSMTGTLAAKASAVAEVPSFQWLDRNVTIDTLMVTTLSQVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSIANGGSANYRSYIDAIRKHIIEYSDIRIILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVAMYLDAGHAGWLGWPANIQPAAELFAGIYNDGKPAAVRGLATNVANYNAWSIASAPSYTQPNPNYDEKHYIEAFSPLLNSAGFPARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHCGLSDALQPAPEAGQWFQ AYFEQLLTNANPPF

The polynucleotide (SEQ ID NO:160) and amino acid (SEQ ID NO:161)sequences of another wild-type M. thermophila xylanase (“Xyl3”) areprovided below. The signal sequence is shown underlined in SEQ IDNO:161. SEQ ID NO:162 provides the sequence of this xylanase without thesignal sequence.

(SEQ NO: 160) ATGCACTCCAAAGCTTTCTTGGCAGCGCTTCTTGCGCCTGCCGTCTCAGGGCAACTGAACGACCTCGCCGTCAGGGCTGGACTCAAGTACTTTGGTACTGCTCTTAGCGAGAGCGTCATCAACAGTGATACTCGGTATGCTGCCATCCTCAGCGACAAGAGCATGTTCGGCCAGCTCGTCCCCGAGAATGGCATGAAGTGGGATGCTACTGAGCCGTCCCGTGGCCAGTTCAACTACGCCTCGGGCGACATCACGGCCAACACGGCCAAGAAGAATGGCCAGGGCATGCGTTGCCACACCATGGTCTGGTACAGCCAGCTCCCGAGCTGGGTCTCCTCGGGCTCGTGGACCAGGGACTCGCTCACCTCGGTCATCGAGACGCACATGAACAACGTCATGGGCCACTACAAGGGCCAATGCTACGCCTGGGATGTCATCAACGAGGCCATCAATGACGACGGCAACTCCTGGCGCGACAACGTCTTTCTCCGGACCTTTGGGACCGACTACTTCGCCCTGTCCTTCAACCTAGCCAAGAAGGCCGATCCCGATACCAAGCTGTACTACAACGACTACAACCTCGAGTACAACCAGGCCAAGACGGACCGCGCTGTTGAGCTCGTCAAGATGGTCCAGGCCGCCGGCGCGCCCATCGACGGTGTCGGCTTCCAGGGCCACCTCATTGTCGGCTCGACCCCGACGCGCTCGCAGCTGGCCACCGCCCTCCAGCGCTTCACCGCGCTCGGCCTCGAGGTCGCCTACACCGAGCTCGACATCCGCCACTCGAGCCTGCCGGCCTCTTCGTCGGCGCTCGCGACCCAGGGCAACGACTTCGCCAACGTGGTCGGCTCTTGCCTCGACACCGCCGGCTGCGTCGGCGTCACCGTCTGGGGCTTCACCGATGCGCACTCGTGGATCCCGAACACGTTCCCCGGCCAGGGCGACGCCCTGATCTACGACAGCAACTACAACAAGAAGCCCGCGTGGACCTCGATCTCGTCCGTCCTGGCCGCCAAGGCCACCGGCGCCCCGCCCGCCTCGTCCTCCACCACCCTCGTCACCATCACCACCCCTCCGCCGGCATCCACCACCGCCTCCTCCTCCTCCAGTGCCACGCCCACGAGCGTCCCGACGCAGACGAGGTGGGGACAGTGCGGCGGCATCGGATGGACGGGGCCGACCCAGTGCGAGAGCCCATGGACCTGCCAGAAGCTGAACGACTGGTACTGGCAGTGCCTG (SEQ NO: 161)MHSKAFLAALLAPAVSGQLNDLAVRAGLKYFGTALSESVINSDTRYAAILSDKSMFGQLVPENGMKWDATEPSRGQFNYASGDITANTAKKNGQGMRCHTMVWYSQLPSWVSSGSWTRDSLTSVIETHMNNVMGHYKGQCYAWDVINEAINDDGNSWRDNVFLRTFGTDYFALSFNLAKKADPDTKLYYNDYNLEYNQAKTDRAVELVKMVQAAGAPIDGVGFQGHLIVGSTPTRSQLATALQRFTALGLEVAYTELDIRHSSLPASSSALATQGNDFANVVGSCLDTAGCVGVTVWGFTDAHSWIPNTFPGQGDALIYDSNYNKKPAWTSISSVLAAKATGAPPASSSTTLVTITTPPPASTTASSSSSATPTSVPTQTRWGQCGGIGWTGPTQCESPW TCQKLNDWYWQCL (SEQNO: 162) QLNDLAVRAGLKYFGTALSESVINSDTRYAAILSDKSMFGQLVPENGMKWDATEPSRGQFNYASGDITANTAKKNGQGMRCHTMVWYSQLPSWVSSGSWTRDSLTSVIETHMNNVMGHYKGQCYAWDVINEAINDDGNSWRDNVFLRTFGTDYFALSFNLAKKADPDTKLYYNDYNLEYNQAKTDRAVELVKMVQAAGAPIDGVGFQGHLIVGSTPTRSQLATALQRFTALGLEVAYTELDIRHSSLPASSSALATQGNDFANVVGSCLDTAGCVGVTVWGFTDAHSWIPNTFPGQGDALIYDSNYNKKPAWTSISSVLAAKATGAPPASSSTTLVTITTPPPASTTASSSSSATPTSVPTQTRWGQCGGIGWTGPTQCESPWTCQKLNDWYWQCL

The polynucleotide (SEQ ID NO:163) and amino acid (SEQ ID NO:164)sequences of a wild-type M. thermophila xylanase (“Xyl 2”) are providedbelow. The signal sequence is shown underlined in SEQ ID NO:164. SEQ IDNO:165 provides the sequence of this xylanase without the signalsequence.

(SEQ NO: 163) ATGGTCTCGTTCACTCTCCTCCTCACGGTCATCGCCGCTGCGGTGACGACGGCCAGCCCTCTCGAGGTGGTCAAGCGCGGCATCCAGCCGGGCACGGGCACCCACGAGGGGTACTTCTACTCGTTCTGGACCGACGGCCGTGGCTCGGTCGACTTCAACCCCGGGCCCCGCGGCTCGTACAGCGTCACCTGGAACAACGTCAACAACTGGGTTGGCGGCAAGGGCTGGAACCCGGGCCCGCCGCGCAAGATTGCGTACAACGGCACCTGGAACAACTACAACGTGAACAGCTACCTCGCCCTGTACGGCTGGACTCGCAACCCGCTGGTCGAGTATTACATCGTGGAGGCATACGGCACGTACAACCCCTCGTCGGGCACGGCGCGGCTGGGCACCATCGAGGACGACGGCGGCGTGTACGACATCTACAAGACGACGCGGTACAACCAGCCGTCCATCGAGGGGACCTCCACCTTCGACCAGTACTGGTCCGTCCGCCGCCAGAAGCGCGTCGGCGGCACTATCGACACGGGCAAGCACTTTGACGAGTGGAAGCGCCAGGGCAACCTCCAGCTCGGCACCTGGAACTACATGATCATGGCCACCGAGGGCTACCAGAGCTCTGGTTCGGCCACTATCGAGGTCCGGGA GGCC (SEQ NO: 164)MVSFTLLLTVIAAAVTTASPLEVVKRGIQPGTGTHEGYFYSFWTDGRGSVDFNPGPRGSYSVTWNNVNNWVGGKGWNPGPPRKIAYNGTWNNYNVNSYLALYGWTRNPLVEYYIVEAYGTYNPSSGTARLGTIEDDGGVYDIYKTTRYNQPSIEGTSTFDQYWSVRRQKRVGGTIDTGKHFDEWKRQGNLQLGTWNYMIM ATEGYQSSGSATIEVREA(SEQ NO: 165) MVSFTLLLTVIAAAVTTASPLEVVKRGIQPGTGTHEGYFYSFWTDGRGSVDFNPGPRGSYSVTWNNVNNWVGGKGWNPGPPRKIAYNGTWNNYNVNSYLALYGWTRNPLVEYYIVEAYGTYNPSSGTARLGTIEDDGGVYDIYKTTRYNQPSIEGTSTFDQYWSVRRQKRVGGTIDTGKHFDEWKRQGNLQLGTWNYMIM ATEGYQSSGSATIEVREA

The polynucleotide (SEQ ID NO:166) and amino acid (SEQ ID NO:167)sequences of another wild-type M. thermophila xylanase (“Xyl1”) areprovided below. The signal sequence is shown underlined in SEQ IDNO:167. SEQ ID NO:168 provides the sequence of this xylanase without thesignal sequence.

(SEQ NO: 166) ATGCGTACTCTTACGTTCGTGCTGGCAGCCGCCCCGGTGGCTGTGCTTGCCCAATCTCCTCTGTGGGGCCAGTGCGGCGGTCAAGGCTGGACAGGTCCCACGACCTGCGTTTCTGGCGCAGTATGCCAATTCGTCAATGACTGGTACTCCCAATGCGTGCCCGGATCGAGCAACCCTCCTACGGGCACCACCAGCAGCACCACTGGAAGCACCCCGGCTCCTACTGGCGGCGGCGGCAGCGGAACCGGCCTCCACGACAAATTCAAGGCCAAGGGCAAGCTCTACTTCGGAACCGAGATCGATCACTACCATCTCAACAACAATGCCTTGACCAACATTGTCAAGAAAGACTTTGGTCAAGTCACTCACGAGAACAGCTTGAAGTGGGATGCTACTGAGCCGAGCCGCAATCAATTCAACTTTGCCAACGCCGACGCGGTTGTCAACTTTGCCCAGGCCAACGGCAAGCTCATCCGCGGCCACACCCTCCTCTGGCACTCTCAGCTGCCGCAGTGGGTGCAGAACATCAACGACCGCAACACCTTGACCCAGGTCATCGAGAACCACGTCACCACCCTTGTCACTCGCTACAAGGGCAAGATCCTCCACTGGGACGTCGTTAACGAGATCTTTGCCGAGGACGGCTCGCTCCGCGACAGCGTCTTCAGCCGCGTCCTCGGCGAGGACTTTGTCGGCATCGCCTTCCGCGCCGCCCGCGCCGCCGATCCCAACGCCAAGCTCTACATCAACGACTACAACCTCGACATTGCCAACTACGCCAAGGTGACCCGGGGCATGGTCGAGAAGGTCAACAAGTGGATCGCCCAGGGCATCCCGATCGACGGCATCGGCACCCAGTGCCACCTGGCCGGGCCCGGCGGGTGGAACACGGCCGCCGGCGTCCCCGACGCCCTCAAGGCCCTCGCCGCGGCCAACGTCAAGGAGATCGCCATCACCGAGCTCGACATCGCCGGCGCCTCCGCCAACGACTACCTCACCGTCATGAACGCCTGCCTCCAGGTCTCCAAGTGCGTCGGCATCACCGTCTGGGGCGTCTCTGACAAGGACAGCTGGAGGTCGAGCAGCAACCCGCTCCTCTTCGACAGCAACTACCAGCCAAAGGCGGCATACAATGCTCTGATTAATGCCT TGTAA (SEQ NO: 167)MRTLTFVLAAAPVAVLAQSPLWGQCGGQGWTGPTTCVSGAVCQFVNDWYSQCVPGSSNPPTGTTSSTTGSTPAPTGGGGSGTGLHDKFKAKGKLYFGTEIDHYHLNNNALTNIVKKDFGQVTHENSLKWDATEPSRNQFNFANADAVVNFAQANGKLIRGHTLLWHSQLPQWVQNINDRNTLTQVIENHVTTLVTRYKGKILHWDVVNEIFAEDGSLRDSVFSRVLGEDFVGIAFRAARAADPNAKLYINDYNLDIANYAKVTRGMVEKVNKWIAQGIPIDGIGTQCHLAGPGGWNTAAGVPDALKALAAANVKEIAITELDIAGASANDYLTVMNACLQVSKCVGITVWGVSDKDSWRSSSNPLLFDSNYQPKAAYNALINAL (SEQ NO: 168)QSPLWGQCGGQGWTGPTTCVSGAVCQFVNDWYSQCVPGSSNPPTGTTSSTTGSTPAPTGGGGSGTGLHDKFKAKGKLYFGTEIDHYHLNNNALTNIVKKDFGQVTHENSLKWDATEPSRNQFNFANADAVVNFAQANGKLIRGHTLLWHSQLPQWVQNINDRNTLTQVIENHVTTLVTRYKGKILHWDVVNEIFAEDGSLRDSVFSRVLGEDFVGIAFRAARAADPNAKLYINDYNLDIANYAKVTRGMVEKVNKWIAQGIPIDGIGTQCHLAGPGGWNTAAGVPDALKALAAANVKEIAITELDIAGASANDYLTVMNACLQVSKCVGITVWGVSDKDSWRSSSNPLLF DSNYQPKAAYNALINAL

The polynucleotide (SEQ ID NO:169) and amino acid (SEQ ID NO:170)sequences of another wild-type M. thermophila xylanase (“Xyl6”) areprovided below. The signal sequence is shown underlined in SEQ IDNO:170. SEQ ID NO:171 provides the sequence of this xylanase without thesignal sequence.

(SEQ NO: 169) ATGGTCTCGCTCAAGTCCCTCCTCCTCGCCGCGGCGGCGACGTTGACGGCGGTGACGGCGCGCCCGTTCGACTTTGACGACGGCAACTCGACCGAGGCGCTGGCCAAGCGCCAGGTCACGCCCAACGCGCAGGGCTACCACTCGGGCTACTTCTACTCGTGGTGGTCCGACGGCGGCGGCCAGGCCACCTTCACCCTGCTCGAGGGCAGCCACTACCAGGTCAACTGGAGGAACACGGGCAACTTTGTCGGTGGCAAGGGCTGGAACCCGGGTACCGGCCGGACCATCAACTACGGCGGCTCGTTCAACCCGAGCGGCAACGGCTACCTGGCCGTCTACGGCTGGACGCACAACCCGCTGATCGAGTACTACGTGGTCGAGTCGTACGGGACCTACAACCCGGGCAGCCAGGCCCAGTACAAGGGCAGCTTCCAGAGCGACGGCGGCACCTACAACATCTACGTCTCGACCCGCTACAACGCGCCCTCGATCGAGGGCACCCGCACCTTCCAGCAGTACTGGTCCATCCGCACCTCCAAGCGCGTCGGCGGCTCCGTCACCATGCAGAACCACTTCAACGCCTGGGCCCAGCACGGCATGCCCCTCGGCTCCCACGACTACCAGATCGTCGCCACCGAGGGCTACCAGAGCAGCGGCTCCTCCGACATCTACGTCCAGACTCACTAG (SEQ NO: 170)MVSLKSLLLAAAATLTAVTARPFDFDDGNSTEALAKRQVTPNAQGYHSGYFYSWWSDGGGQATFTLLEGSHYQVNWRNTGNFVGGKGWNPGTGRTINYGGSFNPSGNGYLAVYGWTHNPLIEYYVVESYGTYNPGSQAQYKGSFQSDGGTYNIYVSTRYNAPSIEGTRTFQQYWSIRTSKRVGGSVTMQNHFNAWAQHGMPLGSHDYQIVATEGYQSSGSSDIYVQTH (SEQ NO: 171)RPFDFDDGNSTEALAKRQVTPNAQGYHSGYFYSWWSDGGGQATFTLLEGSHYQVNWRNTGNFVGGKGWNPGTGRTINYGGSFNPSGNGYLAVYGWTHNPLIEYYVVESYGTYNPGSQAQYKGSFQSDGGTYNIYVSTRYNAPSIEGTRTFQQYWSIRTSKRVGGSVTMQNHFNAWAQHGMPLGSHDYQIVATEGYQSSGS SDIYVQTH

The polynucleotide (SEQ ID NO:172) and amino acid (SEQ ID NO:173)sequences of another wild-type M. thermophila xylanase (“Xyl5”) areprovided below. The signal sequence is shown underlined in SEQ IDNO:173. SEQ ID NO:174 provides the sequence of this xylanase, withoutthe signal sequence.

(SEQ NO: 172) ATGGTTACCCTCACTCGCCTGGCGGTCGCCGCGGCGGCCATGATCTCCAGCACTGGCCTGGCTGCCCCGACGCCCGAAGCTGGCCCCGACCTTCCCGACTTTGAGCTCGGGGTCAACAACCTCGCCCGCCGCGCGCTGGACTACAACCAGAACTACAGGACCAGCGGCAACGTCAACTACTCGCCCACCGACAACGGCTACTCGGTCAGCTTCTCCAACGCGGGAGATTTTGTCGTCGGGAAGGGCTGGAGGACGGGAGCCACCAGAAACATCACCTTCTCGGGATCGACACAGCATACCTCGGGCACCGTGCTCGTCTCCGTCTACGGCTGGACCCGGAACCCGCTGATCGAGTACTACGTGCAGGAGTACACGTCCAACGGGGCCGGCTCCGCTCAGGGCGAGAAGCTGGGCACGGTCGAGAGCGACGGGGGCACGTACGAGATCTGGCGGCACCAGCAGGTCAACCAGCCGTCGATCGAGGGCACCTCGACCTTCTGGCAGTACATCTCGAACCGCGTGTCCGGCCAGCGGCCCAACGGCGGCACCGTCACCCTCGCCAACCACTTCGCCGCCTGGCAGAAGCTCGGCCTGAACCTGGGCCAGCACGACTACCAGGTCCTGGCCACCGAGGGCTGGGGCAACGCCGGCGGCAGCTCCCAGTACACCGTCAGCGGCTGA (SEQ NO: 173)MVTLTRLAVAAAAMISSTGLAAPTPEAGPDLPDFELGVNNLARRALDYNQNYRTSGNVNYSPTDNGYSVSFSNAGDFVVGKGWRTGATRNITFSGSTQHTSGTVLVSVYGWTRNPLIEYYVQEYTSNGAGSAQGEKLGTVESDGGTYEIWRHQQVNQPSIEGTSTFWQYISNRVSGQRPNGGTVTLANHFAAWQKLGLNLGQHDYQVLATEGWGNAGGSSQYTVSG (SEQ NO: 174)APTPEAGPDLPDFELGVNNLARRALDYNQNYRTSGNVNYSPTDNGYSVSFSNAGDFVVGKGWRTGATRNITFSGSTQHTSGTVLVSVYGWTRNPLIEYYVQEYTSNGAGSAQGEKLGTVESDGGTYEIWRHQQVNQPSIEGTSTFWQYISNRVSGQRPNGGTVTLANHFAAWQKLGLNLGQHDYQVLATEGWGNAGGSSQ YTVSG

The polynucleotide (SEQ ID NO:175) and amino acid (SEQ ID NO:176)sequences of a wild-type M. thermophila beta-xylosidase are providedbelow. The signal sequence is shown underlined in SEQ ID NO:176. SEQ IDNO:177 provides the sequence of this xylanase without the signalsequence.

(SEQ NO: 175) ATGTTCTTCGCTTCTCTGCTGCTCGGTCTCCTGGCGGGCGTGTCCGCTTCACCGGGACACGGGCGGAATTCCACCTTCTACAACCCCATCTTCCCCGGCTTCTACCCCGATCCGAGCTGCATCTACGTGCCCGAGCGTGACCACACCTTCTTCTGTGCCTCGTCGAGCTTCAACGCCTTCCCGGGCATCCCGATTCATGCCAGCAAGGACCTGCAGAACTGGAAGTTGATCGGCCATGTGCTGAATCGCAAGGAACAGCTTCCCCGGCTCGCTGAGACCAACCGGTCGACCAGCGGCATCTGGGCACCCACCCTCCGGTTCCATGACGACACCTTCTGGTTGGTCACCACACTAGTGGACGACGACCGGCCGCAGGAGGACGCTTCCAGATGGGACAATATTATCTTCAAGGCAAAGAATCCGTATGATCCGAGGTCCTGGTCCAAGGCCGTCCACTTCAACTTCACTGGCTACGACACGGAGCCTTTCTGGGACGAAGATGGAAAGGTGTACATCACCGGCGCCCATGCTTGGCATGTTGGCCCATACATCCAGCAGGCCGAAGTCGATCTCGACACGGGGGCCGTCGGCGAGTGGCGCATCATCTGGAACGGAACGGGCGGCATGGCTCCTGAAGGGCCGCACATCTACCGCAAAGATGGGTGGTACTACTTGCTGGCTGCTGAAGGGGGGACCGGCATCGACCATATGGTGACCATGGCCCGGTCGAGAAAAATCTCCAGTCCTTACGAGTCCAACCCAAACAACCCCGTGTTGACCAACGCCAACACGACCAGTTACTTTCAAACCGTCGGGCATTCAGACCTGTTCCATGACAGACATGGGAACTGGTGGGCAGTCGCCCTCTCCACCCGCTCCGGTCCAGAATATCTTCACTACCCCATGGGCCGCGAGACCGTCATGACAGCCGTGAGCTGGCCGAAGGACGAGTGGCCAACCTTCACCCCCATATCTGGCAAGATGAGCGGCTGGCCGATGCCTCCTTCGCAGAAGGACATTCGCGGAGTCGGCCCCTACGTCAACTCCCCCGACCCGGAACACCTGACCTTCCCCCGCTCGGCGCCCCTGCCGGCCCACCTCACCTACTGGCGATACCCGAACCCGTCCTCCTACACGCCGTCCCCGCCCGGGCACCCCAACACCCTCCGCCTGACCCCGTCCCGCCTGAACCTGACCGCCCTCAACGGCAACTACGCGGGGGCCGACCAGACCTTCGTCTCGCGCCGGCAGCAGCACACCCTCTTCACCTACAGCGTCACGCTCGACTACGCGCCGCGGACCGCCGGGGAGGAGGCCGGCGTGACCGCCTTCCTGACGCAGAACCACCACCTCGACCTGGGCGTCGTCCTGCTCCCTCGCGGCTCCGCCACCGCGCCCTCGCTGCCGGGCCTGAGTAGTAGTACAACTACTACTAGTAGTAGTAGTAGTCGTCCGGACGAGGAGGAGGAGCGCGAGGCGGGCGAAGAGGAAGAAGAGGGCGGACAAGACTTGATGATCCCGCATGTGCGGTTCAGGGGCGAGTCGTACGTGCCCGTCCCGGCGCCCGTCGTGTACCCGATACCCCGGGCCTGGAGAGGCGGGAAGCTTGTGTTAGAGATCCGGGCTTGTAATTCGACTCACTTCTCGTTCCGTGTCGGGCCGGACGGGAGACGGTCTGAGCGGACGGTGGTCATGGAGGCTTCGAACGAGGCCGTTAGCTGGGGCTTTACTGGAACGCTGCTGGGCATCTATGCGACCAGTAATGGTGGCAACGGAACCACGCCGGCGTATTTTTCGGATTGGAGGTACACACCATTGGAGCAGTTTAGGGAT (SEQ NO: 176)MFFASLLLGLLAGVSASPGHGRNSTFYNPIFPGFYPDPSCIYVPERDHTFFCASSSFNAFPGIPIHASKDLQNWKLIGHVLNRKEQLPRLAETNRSTSGIWAPTLRFHDDTFWLVTTLVDDDRPQEDASRWDNIIFKAKNPYDPRSWSKAVHFNFTGYDTEPFWDEDGKVYITGAHAWHVGPYIQQAEVDLDTGAVGEWRIIWNGTGGMAPEGPHIYRKDGWYYLLAAEGGTGIDHMVTMARSRKISSPYESNPNNPVLTNANTTSYFQTVGHSDLFHDRHGNWWAVALSTRSGPEYLHYPMGRETVMTAVSWPKDEWPTFTPISGKMSGWPMPPSQKDIRGVGPYVNSPDPEHLTFPRSAPLPAHLTYWRYPNPSSYTPSPPGHPNTLRLTPSRLNLTALNGNYAGADQTFVSRRQQHTLFTYSVTLDYAPRTAGEEAGVTAFLTQNHHLDLGVVLLPRGSATAPSLPGLSSSTTTTSSSSSRPDEEEEREAGEEEEEGGQDLMIPHVRFRGESYVPVPAPVVYPIPRAWRGGKLVLEIRACNSTHFSFRVGPDGRRSERTVVMEASNEAVSWGFTGTLLGIYATSNGGNGTTPAYFSD WRYTPLEQFRD (SEQ NO:177) SPGHGRNSTFYNPIFPGFYPDPSCIYVPERDHTFFCASSSFNAFPGIPIHASKDLQNWKLIGHVLNRKEQLPRLAETNRSTSGIWAPTLRFHDDTFWLVTTLVDDDRPQEDASRWDNIIFKAKNPYDPRSWSKAVHFNFTGYDTEPFWDEDGKVYITGAHAWHVGPYIQQAEVDLDTGAVGEWRIIWNGTGGMAPEGPHIYRKDGWYYLLAAEGGTGIDHMVTMARSRKISSPYESNPNNPVLTNANTTSYFQTVGHSDLFHDRHGNWWAVALSTRSGPEYLHYPMGRETVMTAVSWPKDEWPTFTPISGKMSGWPMPPSQKDIRGVGPYVNSPDPEHLTFPRSAPLPAHLTYWRYPNPSSYTPSPPGHPNTLRLTPSRLNLTALNGNYAGADQTFVSRRQQHTLFTYSVTLDYAPRTAGEEAGVTAFLTQNHHLDLGVVLLPRGSATAPSLPGLSSSTTTTSSSSSRPDEEEEREAGEEEEEGGQDLMIPHVRFRGESYVPVPAPVVYPIPRAWRGGKLVLEIRACNSTHFSFRVGPDGRRSERTVVMEASNEAVSWGFTGTLLGIYATSNGGNGTTPAYFSDWRYTPLEQFRD

The polynucleotide (SEQ ID NO:178) and amino acid (SEQ ID NO:179)sequences of a wild-type M. thermophila acetylxylan esterase (“Axe3”)are provided below. The signal sequence is shown underlined in SEQ IDNO:179. SEQ ID NO:180 provides the sequence of this acetylxylan esterasewithout the signal sequence.

(SEQ NO: 178) ATGAAGCTCCTGGGCAAACTCTCGGCGGCACTCGCCCTCGCGGGCAGCAGGCTGGCTGCCGCGCACCCGGTCTTCGACGAGCTGATGCGGCCGACGGCGCCGCTGGTGCGCCCGCGGGCGGCCCTGCAGCAGGTGACCAACTTTGGCAGCAACCCGTCCAACACGAAGATGTTCATCTACGTGCCCGACAAGCTGGCCCCCAACCCGCCCATCATAGTGGCCATCCACTACTGCACCGGCACCGCCCAGGCCTACTACTCGGGCTCCCCTTACGCCCGCCTCGCCGACCAGAAGGGCTTCATCGTCATCTACCCGGAGTCCCCCTACAGCGGCACCTGTTGGGACGTCTCGTCGCGCGCCGCCCTGACCCACAACGGCGGCGGCGACAGCAACTCGATCGCCAACATGGTCACCTACACCCTCGAAAAGTACAATGGCGACGCCAGCAAGGTCTTTGTCACCGGCTCCTCGTCCGGCGCCATGATGACGAACGTGATGGCCGCCGCGTACCCGGAACTGTTCGCGGCAGGAATCGCCTACTCGGGCGTGCCCGCCGGCTGCTTCTACAGCCAGTCCGGAGGCACCAACGCGTGGAACAGCTCGTGCGCCAACGGGCAGATCAACTCGACGCCCCAGGTGTGGGCCAAGATGGTCTTCGACATGTACCCGGAATACGACGGCCCGCGCCCCAAGATGCAGATCTACCACGGCTCGGCCGACGGCACGCTCAGACCCAGCAACTACAACGAGACCATCAAGCAGTGGTGCGGCGTCTTCGGCTTCGACTACACCCGCCCCGACACCACCCAGGCCAACTCCCCGCAGGCCGGCTACACCACCTACACCTGGGGCGAGCAGCAGCTCGTCGGCATCTACGCCCAGGGCGTCGGACACACGGTCCCCATCCGCGGCAGCGACGACATGGCCTTCTTTGGCCTGTGA (SEQ NO: 179)MKLLGKLSAALALAGSRLAAAHPVFDELMRPTAPLVRPRAALQQVTNFGSNPSNTKMFIYVPDKLAPNPPIIVAIHYCTGTAQAYYSGSPYARLADQKGFIVIYPESPYSGTCWDVSSRAALTHNGGGDSNSIANMVTYTLEKYNGDASKVFVTGSSSGAMMTNVMAAAYPELFAAGIAYSGVPAGCFYSQSGGTNAWNSSCANGQINSTPQVWAKMVFDMYPEYDGPRPKMQIYHGSADGTLRPSNYNETIKQWCGVFGFDYTRPDTTQANSPQAGYTTYTWGEQQLVGIYAQGVGHTV PIRGSDDMAFFGL (SEQNO: 180) HPVFDELMRPTAPLVRPRAALQQVTNFGSNPSNTKMFIYVPDKLAPNPPIIVAIHYCTGTAQAYYSGSPYARLADQKGFIVIYPESPYSGTCWDVSSRAALTHNGGGDSNSIANMVTYTLEKYNGDASKVFVTGSSSGAMMTNVMAAAYPELFAAGIAYSGVPAGCFYSQSGGTNAWNSSCANGQINSTPQVWAKMVFDMYPEYDGPRPKMQIYHGSADGTLRPSNYNETIKQWCGVFGFDYTRPDTTQANSPQAGYTTYTWGEQQLVGIYAQGVGHTVPIRGSDDMAFFGL

The polynucleotide (SEQ ID NO:181) and amino acid (SEQ ID NO:182)sequences of a wild-type M. thermophila ferulic acid esterase (“FAE”)are provided below. The signal sequence is shown underlined in SEQ IDNO:182. SEQ ID NO:183 provides the sequence of this xylanase without thesignal sequence

(SEQ NO: 181) ATGATCTCGGTTCCTGCTCTCGCTCTGGCCCTTCTGGCCGCCGTCCAGGTCGTCGAGTCTGCCTCGGCTGGCTGTGGCAAGGCGCCCCCTTCCTCGGGCACCAAGTCGATGACGGTCAACGGCAAGCAGCGCCAGTACATTCTCCAGCTGCCCAACAACTACGACGCCAACAAGGCCCACAGGGTGGTGATCGGGTACCACTGGCGCGACGGATCCATGAACGACGTGGCCAACGGCGGCTTCTACGATCTGCGGTCCCGGGCGGGCGACAGCACCATCTTCGTTGCCCCCAACGGCCTCAATGCCGGATGGGCCAACGTGGGCGGCGAGGACATCACCTTTACGGACCAGATCGTAGACATGCTCAAGAACGACCTCTGCGTGGACGAGACCCAGTTCTTTGCTACGGGCTGGAGCTATGGCGGTGCCATGAGCCATAGCGTGGCTTGTTCTCGGCCAGACGTCTTCAAGGCCGTCGCGGTCATCGCCGGGGCCCAGCTGTCCGGCTGCGCCGGCGGCACGACGCCCGTGGCGTACCTAGGCATCCACGGAGCCGCCGACAACGTCCTGCCCATCGACCTCGGCCGCCAGCTGCGCGACAAGTGGCTGCAGACCAACGGCTGCAACTACCAGGGCGCCCAGGACCCCGCGCCGGGCCAGCAGGCCCACATCAAGACCACCTACAGCTGCTCCCGCGCGCCCGTCACCTGGATCGGCCACGGGGGCGGCCACGTCCCCGACCCCACGGGCAACAACGGCGTCAAGTTTGCGCCCCAGGAGACCTGGGACTTCTTTGATGCCGCCGTCGGAGCGGCCGGCGCGCAGAGCCCGATGACATAA (SEQ NO: 182)MISVPALALALLAAVQVVESASAGCGKAPPSSGTKSMTVNGKQRQYILQLPNNYDANKAHRVVIGYHWRDGSMNDVANGGFYDLRSRAGDSTIFVAPNGLNAGWANVGGEDITFTDQIVDMLKNDLCVDETQFFATGWSYGGAMSHSVACSRPDVFKAVAVIAGAQLSGCAGGTTPVAYLGIHGAADNVLPIDLGRQLRDKWLQTNGCNYQGAQDPAPGQQAHIKTTYSCSRAPVTWIGHGGGHVPDPTGNNGVKFAPQETWDFFDAAVGAAGAQSPMT (SEQ NO: 183)ASAGCGKAPPSSGTKSMTVNGKQRQYILQLPNNYDANKAHRVVIGYHWRDGSMNDVANGGFYDLRSRAGDSTIFVAPNGLNAGWANVGGEDITFTDQIVDMLKNDLCVDETQFFATGWSYGGAMSHSVACSRPDVFKAVAVIAGAQLSGCAGGTTPVAYLGIHGAADNVLPIDLGRQLRDKWLQTNGCNYQGAQDPAPGQQAHIKTTYSCSRAPVTWIGHGGGHVPDPTGNNGVKFAPQETWDFFDAAVG AAGAQSPMT

Example 1 Protease Deletion Strain Production and Testing

In this Example, methods used to produce M. thermophila strainsdeficient in protease production are described.

Method One:

Genomic DNA was isolated from a M. thermophila strain (“CF-409”) thatcontained a deletion of the alp1 gene. The DNA was isolated using thefollowing method: hyphal inoculum was seeded into a standard fungalgrowth medium and allowed to grow for 72 hours at 35° C. The mycelialmat was collected and genomic DNA was extracted using standard methodsknown in the art.

A DNA fragment of the 1 kb internal region of gene “contig_1809.g1” fromgenomic M. thermophila DNA was amplified by primers cdxp001 (SEQ IDNO:184) and cdxp002 (SEQ ID NO:185), shown below. The PCR reaction wasperformed by using the PHUSION® polymerase (NEB) using PHUSION® GCbuffer (NEB) at 98° C. for 30 sec., followed by 35 cycles of 98° C. for10 sec., 72° C. for 1 min, and final extension at 72° C. for 5 min. Theresultant DNA fragment was cloned into plasmid C1V16.1809.g1 (See,FIG. 1) using the IN-FUSION® cloning technique (IN-FUSION® Advantage PCRcloning kit with cloning enhancer, Clontech, Cat. No 639617), using themanufacturer's protocol.

Primer Name Sequence (5′-3′) cdxp001ACCGCGGTGGCGGCCAGGTTCGTTCGTCGTCTCATGTGT (SEQ ID NO: 184) cdxp002CAATAGACATCAGCATCCGGCCAACGAAGAAGGAAAGTA (SEQ ID NO: 185)

Protoplast Preparation

First, 10⁶ spores/ml of M. thermophila cells(W1L100LΔAlp1Δchi1Δpyr5Δbgl1::pyr5Δku70::Hyg) were inoculated into 100ml standard fungal growth medium. The culture was incubated for 24 hoursat 35° C., 250 rpm. To harvest the mycelium, the culture was filteredthrough a sterile Myracloth filter (Calbiochem) and washed with 100 ml1700 mosmol NaCl/CaCl₂ solution (0.6 M NaCl, 0.27 M CaCl₂*H₂O). Thewashed mycelia were transferred into a clean tube and weighed. Caylase(20 mg/g mycelia) was dissolved in 1700 mosmol NaCl/CaCl₂ andUV-sterilized for 90 sec. Then, 3 ml of sterile Caylase solution wasadded to the washed mycelia and mixed. Then, 15 ml of 1700 mosmolNaCl/CaCl₂ solution was added into the tube and mixed. Themycelia/Caylase suspension was incubated at 30° C., 70 rpm for 2 hours.Protoplasts were harvested by filtering through a sterile Myraclothfilter into a sterile 50 ml tube. Then, 25 ml cold STC (1.2 M sorbitol,50 mM CaCl₂*H₂O, 35 mM NaCl, 10 mM Tris-HCl) was added to the flowthrough and the protoplasts were spun down at 2720 rpm for 10 min at 4°C. The pellet was re-suspended in 50 ml STC and centrifuged again. Afterthe washing steps, the pellet was resuspended in 1 ml STC.

Transformation

Transformation was carried out in M. thermophila strain(W1L100LΔAlp1Δchi1Δpyr5Δbgl1::pyr5Δku70::Hyg) protoplasts, wherehomologous integration of the construct would disrupt contig_1809.g1, asdescribed below. First, 5 μg plasmid DNA, 1 μl aurintricarboxylic acid,and 100 μl of the protoplast suspension were mixed together andincubated at room temperature for 25 min. Then, 1.7 ml PEG4000 solution(60% PEG4000 [polyethylene glycol, average molecular weight 4000daltons], 50 mM CaCl₂*H₂O, 35 mM NaCl, 10 mM Tris-HCl) was added andmixed thoroughly. The solution was kept at room temperature for 20 min.The tube was filled with STC, mixed and centrifuged at 2500 rpm for 10min at 4° C. The STC was poured off and the pellet was re-suspended inthe remaining STC and plated on acetamide selective media plates, asknown in the art. The plates were incubated for 5 days at 35° C.Colonies were re-streaked and checked by PCR for the presence of theintegrated plasmid disrupting the protease coding region.

Testing the Effect of Protease Deletion

The protease-deleted strain was grown in fungal growth medium andincubated at 35° C., 250 rpm, 85% humidity for 2 days. An aliquot (10%)of this culture was then used to inoculate fungal growth mediumcomprising glucose, amino acids, minerals, and pen/strep, and incubatedat 35° C., 300 rpm, 85% humidity for 4 days.

The proteolytic activity present in the fermentation medium wasdetermined in microtiter plate assays. In order to determine whetherthere was protease activity capable of clipping purified M. thermophilaCBH1a in the fermentation medium, purified CBH1a was diluted to 1 g/l in50 mM Na acetate buffer, pH5.0 and mixed with fermentation mediumsupernatant, at a ratio of 1:3 (enzyme:fermentation medium). The controlwas fermentation medium obtained from a culture of unmodified M.thermophila strain at the same ratio of enzyme to fermentation medium(i.e., 1:3; enzyme:fermentation broth).

In order to determine whether there was protease activity capable ofclipping purified M. thermophila GH61a in the fermentation medium,purified GH61a was diluted to 0.5 g/l in 50 mM Na acetate buffer, pH5.0and mixed with fermentation medium at a ratio of 1:4(enzyme:fermentation medium). The control was fermentation mediumobtained from a culture of unmodified M. thermophila strain at the sameratio of enzyme to fermentation medium (i.e., 1:4; enzyme:fermentationbroth).

Additional controls included 4 fold diluted pure 1 g/l CBH1a in 50 mM Naacetate buffer (pH5.0), and 5 fold diluted pure 0.5g/1 GH61a in 50 mM Naacetate buffer (pH5.0). In these experiments, 0.25 volume 50 mM Naacetate buffer (pH5.0) was added to each sample.

Samples were taken from time 0 and after 72 h shaking at 38° C., 900rpm. The 0 time point and 72 h time point samples were run on SDS-PAGEand the proteolytic activity of the fermentation supernatants wereassessed based on the level of CBH1a or GH61a lysis in comparison to thecontrols. The SDS-PAGE results showed that the deletion of the proteaseencoded by contig_1809.g1 eliminated GH61a and CBH1a clipping, incontrast to the fermentation medium from the unmodified M. thermophilastrain.

Example 2 Protease Deletion Strain Development and Testing

In this Example, methods used to produce M. thermophila strainsdeficient in protease production are described.

Genomic DNA was isolated from the CF-409 strain using standard methodsknown in the art. Genomic DNA fragments flanking the contig_690.g5 genewere cloned using primers cdxp003 and cdxp004 (upstream homology) andprimers cdxp003 and cdxp004 (downstream homology). The PCR reaction wasperformed by using the GOTAQ® polymerase (Promega) at 95° C. for 2 min,followed by 35 cycles of 95° C. for 30 sec., 53° C. for 30 sec., 72° C.for 1 min, and final extension at 72° C. for 5 min. The resultant DNAfragments were cloned into plasmid pUC19, along with a HygR selectionmarker using the GeneArt cloning technique (GENEART® Seamless Cloningand Assembly Kit, Invitrogen Cat. No. A13288), according to themanufacturer's protocol to create “pUC19-690.g5.” For gene contig_690.g5knock-out, the split-marker method was employed, as known in the art.The two DNA fragments were amplified from the puc19-690.g5 plasmidconstruct by cdxp007-cdxp008 and cdxp009-cdxp010 primers, respectively.The two fragments were co-transformed in equal amounts (2.5 and 2.5 μg)into CF-409 fungal protoplasts to obtain gene deleted strains, asdescribed above.

Primer Name Sequence (5′-3′) cdxp003 AAGAGTGCAAGAGTGAAGGCAGGC (SEQ IDNO: 186) cdxp004 CTAGCACAGTCAGACCTCCACATACCATCGTACTCGCAACTG ACGCTCGTT(SEQ ID NO: 187) cdxp005 GCAGTCGCAGCATTTACATCAGGCTGGTATGTGGAGGTCTGACTGTGCTAG (SEQ ID NO: 188) cdxp006 GCCCGCTGTCATTCAAGACATTGC (SEQ ID NO:189) cdxp007 GCCAAGCTTGCATGCCATCACTGTTGATGACGCTCTCGCT (SEQ ID NO: 190)cdxp008 TGTTGGCGACCTCGTATTGGGAAT (SEQ ID NO: 191) cdxp009TCTCGGAGGGCGAAGAATCTCGTG (SEQ ID NO: 192) cdxp010AATTCGAGCTCGGTACTTGTGCATTTACGGTGCTGTGACG (SEQ ID NO: 193)

Transformation was carried out into UV18#100fΔAlp1Δpyr5Δku70::pyr5 M.thermophila strain. The transformants were incubated for 5 days at 35°C. under standard hygromycin-selective conditions known in the art.Colonies were re-streaked and checked for the deletion of the proteaseusing PCR, as described in Example 1, above.

The protease-deleted strain was grown in a fungal growth medium at pH5.0 and an unmodified strain (control) was grown in the same fungalgrowth medium at pH 5.0 and pH 6.7. Protein profiles were compared using2D gel electrophoresis, using standard methods known in the art.Comparison of the 2D gels showed that CBH1a lysis was significantlyreduced in the protease-deleted strain.

Example 3 Protease Deletion Strain Development and Testing

In this Example, methods used to produce M. thermophila strainsdeficient in protease production are described.

Genomic DNA was isolated from an M. thermophila strain (“CF-409”) with adeletion of the alp1 gene. The DNA was isolated using standard methodsknown in the art. To produce knockout of gene 1086.g13 (v4chr4-45825m24;SEQ ID NO:7), the split-marker method was employed, as known in the art.The 3′ and 5′ homolog arms (i.e., “flanks”) of 1086.g13 were amplifiedfrom genomic DNA by cdx111006-cdx111007 and cdx111008-cdx111009 primers,respectively, as described below.

Primer Name Sequence (5′-3′) cdx111006CCGTCTCTCCGCATGCCAGAAAGATTCCTTCCCTTGCT CCTTCACACTG (SEQ ID NO: 194)cdx111007 CCCCTCCCCTACCTATCTTGTGTCT (SEQ ID NO: 195) cdx111008 GGA TAAGAG TGA ACA ACG ACG AGC (SEQ ID NO: 196) cdx111009GTAACACCCAATACGCCGGCCGAACAAAAGCCATTCT TCCTCCGAGAC (SEQ ID NO: 197)cdx10177 TGTTGGCGACCTCGTATTGGGAAT (SEQ ID NO: 198)

Primers were designed with 24 bp long adapters (first 24 bp in primersCdx 111006 and Cdx 111009) complementary to 5′ and 3′ ends of the HYGRO(i.e., hygromycin B phosphotransferase; “hygromycin gene”) selectionmarker cassette. The 24 bp long adapter part of the Cdx111006 primer iscomplementary to the promoter region of HYGRO cassette, while thecdx111009 primer carries an adapter complementary to the terminatorregion of the HYGRO cassette. The whole HYGRO fragment is 2554 bp inlength. The overlapping homolog arms of hygromycin gene are indicatedherein as “HYG” and “GRO.” The 1559 bp long HYG arm (5′ arm) wasamplified with cdx 10176-cdx 10177 primers. The 1607 bp long GRO arm (3′arm) was amplified with cdx10178-cdx10179 primers. The overlap betweenthe HYG and GRO arms is 612 bp long.

Primer Name Primer Sequence Corresponding Region cdx10176TCTTTCTGGCATGCGGAGAG HYG (5′ arm) forward ACGG (SEQ ID NO: 199) cdx10177TGTTGGCGACCTCGTATTGG HYG (5′ arm) reverse GAAT (SEQ ID NO: 198) cdx10178TCTCGGAGGGCGAAGAATCT GRO(3′ arm) forward CGTG (SEQ ID NO: 199) cdx10179TTCGGCCGGCGTATTGGGTG GRO(3′ arm) reverse TTAC (SEQ ID NO: 200)

The HYG 5′ homolog arm was amplified using the following PCR parameters:denaturation at 95° C. for 2 min, followed by 35 cycles of 95° C. for 20sec, 60° C. for 20 sec, 72° C. for 1 min, and final extension at 72° C.for 3 min. The 50 μl reaction volume contained 10 μl 5× HERCULASE® IIreaction buffer (Agilent Technologies), 0.5 μl 25 mM dNTPs, 1 μl primerCdx10176 (10 mM), 1 μl primer Cdx10177 (10 mM), 3% DMSO, 1 μl DNAtemplate, HERCULASE® II Fusion Enzyme (Agilent Technologies) with dNTPsCombo (Agilent Technologies), H₂O was added to 50 μl final volume.

GRO 3′ homolog arm was amplified using the following PCR parameters:denaturation at 95° C. for 2 min, followed by 35 cycles of 95° C. for 20sec, 60° C. for 20 sec, 72° C. for 1 min, and final extension at 72° C.for 3 min. The 50 μl reaction volume contained 10 μl 5× HERCULASE® IIreaction buffer (Agilent Technologies), 0.5 μl 25 mM dNTPs, 1 μl primerCdx10178 (10 mM), 1 μl primer Cdx10179 (10 mM), 3% DMSO, 1 μl DNAtemplate, HERCULASE® II Fusion Enzyme (Agilent Technologies) with dNTPsCombo (Agilent Technologies), H₂O was added to 50 μl final volume.

The 943 bp long 1086.g13 3′ homolog arm was amplified using thefollowing PCR parameters: denaturation at 98° C. for 30 sec, followed by35 cycles of 98° C. for 10 sec, 62° C. for 20 sec, and 72° C. for 30sec, followed by final extension at 72° C. for 5 min. The 50 μl reactionvolume contained 10 μl 5× PHUSION® GC buffer, 0.5 μl 25 mM dNTPs, 1 μlCdx111006 (10 mM), 1 μl primer Cdx111007 (10 mM), 3% DMSO, 1 μl DNAtemplate, 0.5 μl PHUSION® Hot Start High-Fidelity Polymerase(Finnzymes), H₂O was added to 50 μl final volume.

The 852 bp long 1086.g13 5′ homolog arm was amplified using thefollowing PCR parameters: denaturation at 95° C. for 2 min, followed by35 cycles of 95° C. for 20 sec, 62° C. for 20 sec, 72° C. for 30 sec,and final extension at 72° C. for 3 min. The 50 μl reaction volumecontained 10 μl 5× HERCULASE® II reaction buffer (Agilent Technologies),0.5 μl 25 mM dNTPs, 1 μl primer Cdx111008 (10 mM), 1 μl primer Cdx111009(10 mM), 3% DMSO, 1 μl DNA template, HERCULASE® II Fusion Enzyme(Agilent Technologies) with dNTPs Combo (Agilent Technologies), H₂O wasadded to 50 μl final volume.

The sizes of the PCR fragments were checked on precast 1.2% EtBr E-gel(Invitrogen). Fragments were spin column purified (QIAQUICK® PCRPurification Kit; Qiagen), and eluted in 50 μl elution buffer.

To attach the 1086.g13 3′ homolog arm to HygR fragment, a 50 μl reactionvolume containing 10 μl 5× HERCULASE® II reaction buffer (AgilentTechnologies), 0.5 μl 25 mM dNTPs, 1 μl primer cdx111007 (10 mM), 1 μlprimer cdx10177 (10 mM), 3% DMSO, 0.5 μl 3′ arm (20 ng), 0.5 μl HYGfragment (20 ng), HERCULASE® II Fusion Enzyme (Agilent Technologies)with dNTPs Combo (Agilent Technologies), with H₂O added to 50 μl finalvolume. The primers used were Cdx111007 and Cdx10177. The following PCRparameters were used: denaturation at 95° C. for 2 min, followed by 35cycles of 95° C. for 20 sec, 61° C. for 20 sec, and 72° C. for 1.5 min,followed by final extension at 72° C. for 3 min. The size of the1086-3′+HYG construct was 2502 bp.

In order to attach the 1086.g13 5′ homolog arm to the GRO fragment, a 50μl reaction volume containing 10 μl 5× HERCULASE® II reaction buffer(Agilent Technologies), 0.5 μl 25 mM dNTPs, 1 μl primer cdx111008 (10mM), 1 μl primer cdx10178 (10 mM), 3% DMSO, 0.5 μl 5′ arm (20 ng), 0.5μl GRO fragment (20 ng), HERCULASE® II Fusion Enzyme (AgilentTechnologies) with dNTPs Combo (Agilent Technologies), with H₂O added to50 μl final volume. The PCR parameters used were as follows:denaturation at 95° C. for 2 min, followed by 35 cycles of 95° C. for 20sec, 58° C. for 20 sec, and 72° C. for 1.5 min, followed by finalextension at 72° C. for 3 min. The size of the 1086-5′+GRO construct was2459 bp.

Primer name Sequence (5′-3′) cdx111008 GGATAAGAGTGAACAACGACGAGC (SEQ IDNO: 196) cdx10178 TCTCGGAGGGCGAAGAATCTCGTG (SEQ ID NO: 200)

Both constructs (1086-3′+HYG 1086-5′ GRO) and were checked on precast1.2% EtBr E-gel (Invitrogen). They were spin column purified (QIAQUICK®PCR Purification Kit; Qiagen), and eluted in 50 μl elution buffer. Thetwo constructs were co-transformed in equal amounts (2 μg each) intoCF-409 fungal protoplasts to obtain gene deleted strains, as describedbelow.

Transformation into M. thermophila cells(W1L100LΔAlp1Δchi1Δpyr5Δbgl1::pyr5Δku70::Hyg) was performed as describedin Example 1. The transformants were incubated for 5 days at 35° C.under standard hygromycin-selective conditions known in the art.Colonies were re-streaked and checked for the deletion of the proteaseusing PCR, as described in Example 1, above.

The protease-deleted strain was grown in fungal growth medium andincubated at 35° C., 250 rpm, 85% humidity for 2 days. An aliquot (10%)of this culture was then used to inoculate fungal growth mediumcomprising glucose, CSS (corn stover solids), minerals and incubated at35° C. at pH=5.0 for 4 days. The clipping of the enzyme CBH1a wasdetermined using 2D gel (Biorad) which showed detectable decrease of theclipping in the protease deleted strain compared to the control.

What is claimed is:
 1. A method for producing a genetically modifiedMyceliophthora deficient in at least one protease native to saidMyceliophthora, comprising providing a Myceliophthora having proteaseactivity, wherein said protease activity comprises lysing Myceliophthoracellobiohydrolase enzymes and said protease comprises an amino acidsequence having at least 98% identity with the polypeptide sequence setforth in SEQ ID NO:3, 6, 9, or 12; and mutating said Myceliophthoraunder conditions such that said protease is mutated to produce aprotease-deficient Myceliophthora.
 2. The method of claim 1, whereinsaid Myceliophthora is Myceliophthora thermophila.
 3. The method ofclaim 1, wherein said Myceliophthora further produces at least onecellulase selected from endoglucanases, cellobiohydrolases, cellobiosedehydrogenases, endoxylanases, beta-xylosidases, arabinofuranosidases,alpha-glucuronidases, acetylxylan esterases, feruloyl esterases, and/oralpha-glucuronyl esterases.
 4. The method of claim 3, wherein saidcellulase is a recombinant cellulase selected from EG1b, EG2, EG3, EG4,EG5, EG6, CBH1a, CBH1b, CBH2a, CBH2b, GH61a, GH61f, and/or GH61p.
 5. Themethod of claim 1, wherein at least one polynucleotide sequence selectedfrom SEQ ID NOS:1, 2, 4, 5, 7, 8, 10, or 11 have been deleted from thegenome of said Myceliophthora, such that the Myceliophthora produces areduced level of at least one protease, as compared to a wild-typeMyceliophthora.
 6. The method of claim 1, wherein at least onepolynucleotide sequence selected from SEQ ID NOS:1, 2, 4, 5, 7, 8, 10,or 11 have been mutated in the genome of said Myceliophthora, such thatthe Myceliophthora produces a reduced level of at least one protease, ascompared to a wild-type Myceliophthora.
 7. The method of claim 5,wherein said Myceliophthora is Myceliophthora thermophila.
 8. The methodof claim 6, wherein said Myceliophthora is Myceliophthora thermophila.